I have checked the FSDirectory, it will create MMapDirectory or
NIOFSDirectory for Directory.
This two directory only supply IndexInput extend for read file (MMapIndexInput
extends ByteBufferIndexInput),
why not there is not MMap/NIO IndexOutput extend for file write. It only use
FSIndexOutput
We have a search system based on Solr using the Solrnet library in C# which
supports some advanced search features like Fuzzy, Synonym and Stemming.
While all of these work, *the expectation from the Stemming Search seems to
be a combination of Stemming by reduction as well as stemming by
And if Solr has to spit it out, perhaps you could do that with a simple
salt transform or velocity template.
Upayavira
On Fri, Jun 28, 2013, at 12:30 AM, Learner wrote:
Might not be useful but a work around would be to divide all scores by
max
score to get scores between 0 and 1.
--
Thanks for the explanation, I was missing exaclty that!
Now things works correctly also using the post script.
However I don't think I need norms if I use id of same lenght (UUID), right?
I just need strings with omitTermFreqAndPositions=false I think.
On Thu, Jun 27, 2013 at 7:31 PM, Erick
field length normalisation is based upon the number of terms in a field,
not the number of characters in a term. I guess with multivalued string
fields, that would mean a field with lots of values (but one match)
would score lower than one with only one matching value.
Upayavira
On Fri, Jun 28,
Thanks for the replies.
I have already tried options mentioned here, apparently those provide
suggestions for the query word which is incorrectly spelled. I am looking a
feature that - my query term is correct and I want the results in those
documents both correct spelled term matches and
You're wanting to make your search more fuzzy. You could try phonetic
search, but that's very fuzzy. Go to the analysis tab in the admin UI.
Locate the 'phonetic' field type in the drop down, and you can see what
will happen to terms when they are converted to phonetic equivalents.
Upayavira
On
My search query is having multiple words ranging from 3 to 8 and a context
attached to it. I am looking for the search result documents which should
have all the terms which are there in query and also terms in the document
should relate or have the similar context.
For example: my search query
you might use proximity. low blood pressure~6 might match #1 and #2
but not #3.
It says find phrases that require six or less position moves in order to
match my terms as a phrase.
Upayavira
On Fri, Jun 28, 2013, at 11:10 AM, venkatesham.gu...@igate.com wrote:
My search query is having
Output is quite a bit simpler than input because all we do is write a
single stream of bytes with no seeking (append only), and it's done
with only one thread, so I don't think there'd be much to gain by
using the newer IO APIs for writing...
Mike McCandless
http://blog.mikemccandless.com
On
This is a no-op, or rather I'm not sure what it does:
copyField source=url dest=url/
This is the key:
copyField source=iframe dest=text/
But be aware that if you copy anything
else into the text field you'll be searching
there too.
Now you can search the text field. Assuming
this is from the
I'm guessing you're well aware that the example you
gave is parsed as search_field:love default_field:obama.
Which isn't pertinent, there's nothing that
looks like it should take any time at all here, to say
nothing of 120 seconds.
So start with debug=query and see what the
filter query is
bq: Is there anyway to perform the field query after the results are
collapsed?
I'm not quite sure what you mean here. The intent of fq
clauses it that they apply to the entire query before
anything else, including field collapsing (and I'm
assuming you mean group.field, not collapse.field)
First, how much slower? 2x? 10x? 1.1x?
When using embedded, you're doing all the
work you were doing on two machines on a
single machine, so my first question would
be how is your CPU performaing? Is it maxed?
Best
Erick
On Thu, Jun 27, 2013 at 1:59 PM, Learner bbar...@gmail.com wrote:
First, this is for the Java version, I hope it extends to C#.
But in your configuration, when you're indexing the stemmer
should be storing the reduced form in the index. Then, when
searching, the search should be against the reduced term.
To check this, try
1 Using the Admin/Analysis page to see
One variant on Upayavira's comment would be to use
the proximity as a boost query. That way all three would
match, but the first two would get higher scores.
Either way should work though.
Best
Erick
On Fri, Jun 28, 2013 at 6:29 AM, Upayavira u...@odoko.co.uk wrote:
you might use proximity.
Hi Erick,
I actually did mean collapse.field, as per:
http://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/
On high level I am trying to avoid the use of a join between a list of
entries and a list of actions that users have performed on a entry (since
it's not supported
Well, now I'm really puzzled. The link you referenced was from when
grouping/field collapsing was under development. I did a quick look
through the entire 4x code base fo collapse and there's no place
I saw that looks like it accepts that parameter. Of course I may have
just missed it.
What
Erick,
Thx for your reply. The external file field fields are already under
dataDir specified in solrconfig.xml. They are not getting replicated.
(Solr version 4.2.1.)
On Thu, Jun 27, 2013 at 10:50 AM, Erick Erickson erickerick...@gmail.comwrote:
Haven't tried this, but I _think_ you can use
Hi,
I'm using lucene and solr right now in a production environment with an
index of about a million docs. I'm working on a recommender that basically
would list the n most similar items to the user based on the current item
he is viewing.
I've been thinking of using solr/lucene since I already
Show us your confFiles directive. Maybe there is some subtle error in the
file name.
-- Jack Krupansky
-Original Message-
From: Arun Rangarajan
Sent: Friday, June 28, 2013 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Replicating files containing external file fields
Erick,
I am trying to combine geospatial query (latlong) with the below query inside
a search component but I am getting the below error..
*Error:*
lst name=error
str name=msgmissing sfield for spatial request/str
int name=code400/int
/lst
/response
str name=fq_bbox
(
Why not just use mahout to do this, there is an item similarity algorithm in
mahout that does exactly this :)
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html
You can use mahout in distributed and non-distributed mode
Hey saikat, thanks for your suggestion. I've looked into mahout and other
alternatives for computing k nearest neighbors. I would have to run a job
and computer the k nearest neighbors and track them in the index for
retrieval. I wanted to see if this was something I could do with lucene
using
Hello,
I ran the solr example as described in
http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc
files to solr as described in
http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used
to load the files were of the form
curl
Hi,
Have a look at http://www.youtube.com/watch?v=13yQbaW2V4Y . I'd say
it's easier than Mahout, especially if you already have and know your
way around Solr.
Otis
--
Solr ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Fri, Jun 28, 2013 at
You could build a custom recommender in mahout to accomplish this, also just
out of curiosity why the content based approach as opposed to building a
recommender based on co-occurence. One other thing, what is your data size,
are you looking at scale where you need something like hadoop?
More Like This already is kNN. It extracts features from the document (makes a
query), and runs that query against the collection.
If you want the items most similar to the current item, use MLT.
wunder
On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote:
Hey saikat, thanks for
Can you just use two queries to achieve the desired results ?
Query1 to get all actions where !entry_read:1 for some range of rows (your
page size)
Query2 to get all the entries with an entry_id in the results of Query1
The second query would be very direct and only query for a set of entries
Hi,
It doesn't have to be one or the other. In the past I've built a news
recommender engine based on CF (Mahout) and combined it with Content
Similarity-based engine (wasn't Solr/Lucene, but something custom that
worked with ngrams, but it may have as well been Lucene/Solr/ES). It
worked well.
I only have about a million docs right now so scaling is not a big issue.
I'm looking to provide a quick implementation and then worry about scale
when I get around to implementing a more robust recommender. I'm looking at
a content based approach because we are not tracking users and items viewed
Hi,
I'd go talk to the DBA. How long does this query take if you run it
directly against Oracle? How long if you run it locally vs. from a
remove server (like Solr is in relation to your Oracle server(s)).
What happens if you increase batchSize?
Otis
--
Solr ElasticSearch Support --
Hi Erick,
I have no idea how I managed to get that working. I was messing around a
lot. I may have added org.apache.solr.handler.component.CollapseComponent
to an older version :- Unfortunately, I've formatted the server since to
try some other options.
I did find the official wiki page for
Unfortunately not. That would require an object for every single entry for
every single user.
Generating millions of basically empty objects just for this query is likely
impossible.
:(
--
View this message in context:
Thanks Mark.
We use commit=true as part of the request to add documents. Something like
this:
echo $data| curl --proxy --silent
http://HOST:9983/solr/collection1/update/csv?commit=trueseparator=|fieldnames=$fieldnames_shard_=shard1
--data-binary @- -H 'Content-type:text/plain;
Hi
I have a multicore setup (in 4.3.0). Is it possible for one core to share an
instance of its class with other cores at run time? i.e.
At run time core 1 makes an instance of object O_i
core 1 -- object O_i
core 2
---
core n
then can core K access O_i? I know they can share properties but
So I thought I had it correctly setup but I'm receiveing the following
response to my Data Config
Last Update: 18:17:52
(Duration: 07s)
Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)
Started: 13 minutes ago
Here's my Data config.
dataConfig
dataSource
Hi,
Maybe fileName=*.zip instead of .*zip ?
Steve
On Jun 28, 2013, at 2:20 PM, ericrs22 ericr...@yahoo.com wrote:
So I thought I had it correctly setup but I'm receiveing the following
response to my Data Config
Last Update: 18:17:52
(Duration: 07s)
Requests: 0 (0/s), Fetched: 0
Yeah, that is what I would try until 4.4 comes out - and it should not matter
replica or leader.
- Mark
On Jun 28, 2013, at 3:13 PM, Joshi, Shital shital.jo...@gs.com wrote:
Thanks Mark.
We use commit=true as part of the request to add documents. Something like
this:
echo $data| curl
Thanks!
-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Friday, June 28, 2013 5:06 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey
Yeah, that is what I would try until 4.4 comes out - and it should not matter
replica or leader.
- Mark
On Jun 28,
unfortunately not. I had tried that before with the logs saying:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near
index 0
With .*zip i get this:
WARN
SimplePropertiesWriter
Unable to read:
Hi,
I am trying to figure out how to change the schema/config of an existing core
or a core to be created via http calls to solr. After spending hours in
searching online, I still could not find any documents showing me how to do it.
The only way I know is that you have to log on to the solr
Hello!
In 4.3.1 you can only read schema.xml or portions of it using Schema
API (https://issues.apache.org/jira/browse/SOLR-4658). It is a start
to allow schema.xml modifications using HTTP API, which will be a
functionality of next release of Solr -
Hi all,
I think I have found an issue (or misleading behavior, per say) about
atomic updates.
If I do atomic updates on a field, and if the operation is none-sense
(anything other than add, set, inc), it still returns success. Say I send:
/update/json?commit=true -d '[{id:...,
Well, it is known to me and documented in my book. BTW, that field value is
simply ignored.
There are tons of places in Solr where undefined values or outright garbage
are simply ignored, silently.
Go ahead and file a Jira though.
-- Jack Krupansky
-Original Message-
From: Sam
Hi,
It only allow adding new fields to the existing schema.
My problem is that I am trying to provide my own schema file when I create a
new core and I do not have ssh access to the solr host. Is this not even
possible?
Regards,
james
-Original Message-
From: Rafał Kuć
How could you not have ssh access to the Solr host machine? I mean, how are
you managing that server, without ssh access?
And if you are not managing the server, what business do you have trying to
change the Solr configuration?!?!?
Something fishy here!
-- Jack Krupansky
-Original
Hi,
Well, we try to use Solr to run a multi-tenant index/search service. We
assigns each client a different core with their own config and schema. It
would be good for us if we can just let the customer to be able to create cores
with their own schema and config. The customer would
Hey guys,
This has to be a stupid question/I must be doing something wrong, but after
frequent load testing with documentCache enabled under Solr 4.3.1 with
autoWarmCount=150, I'm noticing that my documentCache metrics are always
zero for non-cumlative.
At first I thought my commit rate is fast
Ah, yes, good old multi-tenant - I should have known.
Yeah, the Solr API is evolving, albeit too slowly for the needs of some.
-- Jack Krupansky
-Original Message-
From: Wu, James C.
Sent: Friday, June 28, 2013 7:06 PM
To: solr-user@lucene.apache.org
Subject: RE: change solr core
To answer some of my own question, Shawn H's great reply on this thread
explains why I see no autoWarming on doc cache:
http://www.marshut.com/iznwr/soft-commit-and-document-cache.html
It is still unclear to me why I see no other metrics, however.
Thanks Shawn,
Tim
On 28 June 2013 16:14, Tim
Hi Tim,
Not sure about the zeros in 4.3.1, but in SPM we see all these numbers
are non-0, though I haven't had the chance to confirm with Solr 4.3.1.
Note that you can't really autowarm document cache...
Otis
--
Solr ElasticSearch Support -- http://sematext.com/
Performance Monitoring --
Thanks Otis,
Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.
Thanks!
Tim
On 28 June 2013 16:22, Otis Gospodnetic otis.gospodne...@gmail.com wrote:
Hi Tim,
Not sure about the zeros in 4.3.1, but in SPM we see
Jack,
Here is the ReplicationHandler definition from solrconfig.xml:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=master
str name=enable${enable.master:false}/str
str name=replicateAfterstartup/str
str name=replicateAftercommit/str
str
I am running Solr 4.3.0, using DIH to import data from MySQL. I am running
into a very strange problem where data from a datetime column being
imported with the right date but the time is 00:00:00. I tried using SQL
DATE_FORMAT() and also DIH DateFormatTransformer but nothing works. The
raw
Yes, you need to list that EFF file in the confFiles list - only those
listed files will be replicated.
str
name=confFilessolrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml,
/var/solr-data/List/external_*/str
Oops... sorry, no wildcards... you must list the
Thanks, confirmed by trying w/ 4.3.1 that the join works with the outer
collection distributed/sharded so long as the inner collection is not
distributed/sharded.
Chris
On Tue, Jun 25, 2013 at 4:55 PM, Upayavira u...@odoko.co.uk wrote:
I have never heard mention that joins support distributed
I've been working on improving index time with a JdbcDataSource DIH based
config and found it not to be as performant as I'd hoped for, for various
reasons, not specifically due to solr. With that said, I decided to switch
gears a bit and test out FileDataSource setup... I assumed by
Hello,
I have a usecase where I need to retrive top 2000 documents matching a
query.
What are the parameters (in query, solrconfig, schema) I shoud look at to
improve this?
I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB RAM,
8vCPU and 7GB JVM heap size.
I have
On Tue, Jun 25, 2013 at 7:55 PM, Upayavira u...@odoko.co.uk wrote:
However, if from your example, innerCollection was replicated across all
nodes, I would think that should work, because all that comes back from
each server when a distributed search happens is the best 'n' matches,
so exactly
Also, I don't see a consistent response time from solr, I ran ab again and
I get this:
ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500
http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201rows=2000wt=json
Benchmarking x.amazonaws.com (be patient)
Completed 100
There is very little shared between multiple cores (instanceDir paths,
logging config maybe?). Why are you trying to do this?
On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin pey...@robustlinks.com wrote:
Hi
I have a multicore setup (in 4.3.0). Is it possible for one core to share an
instance
What is dataSource=binaryFile? I don't see any such data source
defined in your configuration.
On Fri, Jun 28, 2013 at 11:50 PM, ericrs22 ericr...@yahoo.com wrote:
So I thought I had it correctly setup but I'm receiveing the following
response to my Data Config
Last Update: 18:17:52
The default in JdbcDataSource is to use ResultSet.getObject which
returns the underlying database's type. The type specific methods in
ResultSet are not invoked unless you are using convertType=true.
Is MySQL actually returning java.sql.Timestamp objects?
On Sat, Jun 29, 2013 at 5:22 AM, Bill Au
64 matches
Mail list logo