Different options for autocomplete/autosuggestion

2011-03-14 Thread Kai Schlamp
Hi. There seems to be several options for implementing an autocomplete/autosuggestions feature with Solr. I am trying to summarize those possibilities together with their advantages and disadvantages. It would be really nice to read some of your opinions. * Using N-Gram filter + text field query

Re: Custom request handler/plugin

2011-03-14 Thread javaxmlsoapdev
Below are the reasons why I thought it wouldn't be feasible to have pre-filtered results with filter queries. please comment. Since can't pen down direct business reqs due to confidentially contact with the client, I'll mock out scenario using an example. - There is a parent entity called Quiz,

Solr sorting

2011-03-14 Thread Denis Kuzmenok
Hi. Is there any way to make such scheme working: I have many documents, each has a random field to enable random sorting, and i have a weight field. I want to get random results, but documents with bigger weight should appear more frequently. Is that possible? Thanks, in advance.

Re: Version Incompatibility(Invalid version (expected 2, but 1) or the data in not in 'javabin' format)

2011-03-14 Thread Ahmet Arslan
          I am using Solr 4.0 api to search from index (made using solr1.4 version). I am getting error Invalid version (expected 2, but 1) or the data in not in 'javabin' format. Can anyone help me to fix problem. You need to use solrj version 1.4 which is compatible to your index

Re: Solr sorting

2011-03-14 Thread Ahmet Arslan
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote: From: Denis Kuzmenok forward...@ukr.net Subject: Solr sorting To: solr-user@lucene.apache.org Date: Monday, March 14, 2011, 10:23 AM Hi. Is there any way to make such scheme working: I  have  many  documents,  each  has  a 

Re: Solr sorting

2011-03-14 Thread Denis Kuzmenok
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote: From: Denis Kuzmenok forward...@ukr.net Subject: Solr sorting To: solr-user@lucene.apache.org Date: Monday, March 14, 2011, 10:23 AM Hi. Is there any way to make such scheme working: I  have  many  documents,  each  has  a 

ExternalFileField with whitespaces

2011-03-14 Thread Miriam Doelle
Hi, we use an external file field configured as dynamic field. The dynamic field name (and so the name of the provided file) may contain spaces. But currently it is not possible to query for such fields. The following query results in a ParseException: q=val:(experience_foo\ bar)

Re: Solr sorting

2011-03-14 Thread Ahmet Arslan
--- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote: From: Denis Kuzmenok forward...@ukr.net Subject: Re: Solr sorting To: Ahmet Arslan solr-user@lucene.apache.org Date: Monday, March 14, 2011, 12:24 PM --- On Mon, 3/14/11, Denis Kuzmenok forward...@ukr.net wrote: From:

RE: Results driving me nuts!

2011-03-14 Thread cbennett
-Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Sunday, March 13, 2011 6:25 PM To: solr-user@lucene.apache.org; andy.ne...@gmail.com Subject: Re: Results driving me nuts! --- On Sun, 3/13/11, Andy Newby andy.ne...@gmail.com wrote: From: Andy Newby

Question about Term Vectors

2011-03-14 Thread Ahsan |qbal
Hi All Is there any way to drop term vectors from already built index file. Regards Ahsan Iqbal

Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread lame
Hi guys, I have master slave replication enabled. Slave is replicating every 3 minutes and I encourage problems while I'm performing full import command on master (which takes about 7 minutes). Slave repliacates partial index about 200k documents out of 700k. After next repliacation full index is

Re: Question about Term Vectors

2011-03-14 Thread Markus Jelsma
You need to reindex. On Monday 14 March 2011 14:04:00 Ahsan |qbal wrote: Hi All Is there any way to drop term vectors from already built index file. Regards Ahsan Iqbal -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread Markus Jelsma
Do you commit to often? Slaves won't replicate if while master is indexing if you don't send commits. Can you only commit once the indexing finishes? On Monday 14 March 2011 14:04:51 lame wrote: Hi guys, I have master slave replication enabled. Slave is replicating every 3 minutes and I

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread lame
I don't commit at all we use Dataimporter, but I have a feeling that it could be done by DIH (autocommit is it possible)? 2011/3/14 Markus Jelsma markus.jel...@openindex.io: Do you commit to often? Slaves won't replicate if while master is indexing if you don't send commits. Can you only

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread Markus Jelsma
In solrconfig there might be a autocommit section enabled. On Monday 14 March 2011 14:18:42 lame wrote: I don't commit at all we use Dataimporter, but I have a feeling that it could be done by DIH (autocommit is it possible)? 2011/3/14 Markus Jelsma markus.jel...@openindex.io: Do you

Re: Using Solr over Lucene effects performance?

2011-03-14 Thread sivaram
Thanks alot Glen and Yonik... That's a very convincing explanation... -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr-over-Lucene-effects-performance-tp2666909p2676015.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread lame
It looks like (we don't have autocommit section in solr.DirectUpdateHandler2, is ramBufferSizeMB is responsible for that?): indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor ramBufferSizeMB320/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs

Re: Query on facet field’s count

2011-03-14 Thread Jonathan Rochkind
It's not easy if you have lots of facet values (in my case, can even be up to a million), but there is no way built-in to Solr to get this. I have been told that some of the faceting strategies (there are actually several in use in Solr based on your parameters and the nature of your data)

Re: Results driving me nuts!

2011-03-14 Thread Jonathan Rochkind
On 3/13/2011 6:24 PM, Ahmet Arslan wrote: http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm I can see that the one with 5 matches is longer than the other. Shorter documents are favored in solr/lucene with length normalization factor. Is

MySQL queries high when using delta-import

2011-03-14 Thread Robert Gründler
Hi, we have 3 solr cores, each of them is running a delta-import every 2 minutes on a MySQL database. We've noticed a significant increase of MySQL queries per second, since we've started the delta updates. Before that, the database server received between 50 and 100 queries per second,

Re: Results driving me nuts!

2011-03-14 Thread Markus Jelsma
You can use omitNorms=true for any given field. Length normalization will be disabled and index-time boosting will not be available any more. TermFrequencies can also be disabled by setting omitTermFreqAndPositions=true for any given field. Omitting TF can be very useful if you need an easy

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread Markus Jelsma
These settings don't affect a commit. But, the maxPendingDeletes might but i'm unsure. If you commit on the master and slaves are configured to replicate on commit, it all should have the same index version. On Monday 14 March 2011 14:42:51 lame wrote: It looks like (we don't have autocommit

Re: MySQL queries high when using delta-import

2011-03-14 Thread Stefan Matheis
Robert, that may extremly depend on your (sub-)entities, and how you built your queries. perhaps http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor would help you - like said, depending on your config Regards Stefan 2011/3/14 Robert Gründler rob...@dubture.com: Hi, we

Re: MySQL queries high when using delta-import

2011-03-14 Thread Bill Bell
You could use clean= false parameter trick then just use query. Thus would reduce the queries by half for deltas. Bill Bell Sent from mobile On Mar 14, 2011, at 8:57 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Robert, that may extremly depend on your (sub-)entities, and how

Re: Sorting

2011-03-14 Thread Brian Lamb
It doesn't necessarily need to go through an XSLT but the idea remains the same. I want have the highest scores first no matter which result they match with. So if the results are like this: lst name=moreLikeThis result name=3 numFound=2 start=0 maxScore=0.439 doc float

Dynamically boost search scores

2011-03-14 Thread Brian Lamb
Hi all, I have a field in my schema called boost_score. I would like to set it up so that if I pass in a certain flag, each document score is boosted by the number in boost_score. For example if I use: http://localhost/solr/search/?q=dog I would get search results like normal. But if I use:

Re: Dynamically boost search scores

2011-03-14 Thread Markus Jelsma
See boosting documents by function query. This way you can use document's boost_score field to affect the final score. http://wiki.apache.org/solr/FunctionQuery On Monday 14 March 2011 16:40:42 Brian Lamb wrote: Hi all, I have a field in my schema called boost_score. I would like to set it

WDF, automatic phrase queries and omitTermFreqAndPositions

2011-03-14 Thread Markus Jelsma
Hi, In Solr 1.4.1 we don't have feature to disable automatic generation of phrase queries. The phrase queries are generated thanks of the word delimiter filter i use. The problem is, i cannot use the QS parameter in DisMax to allow slop for these generated phrase queries because i require a

Re: Solr admin page timed out and index updating issues

2011-03-14 Thread Ranma
I am still stuck at the same point. Looking here and there I could read that the memory limit (heap space) may need to be increased to -Xms512M -Xmx512M when launching the java -jar start.jar command. But in my vps I've been forced to set the Xmx limit to maximum Xmx400M since at higher value

Re: Results driving me nuts!

2011-03-14 Thread Jonathan Rochkind
Aha. Yeah, I've read the documentation several times,but still find myself confused. But do I understand this right now: If I do omitNorms=true, but still leave term freq and positions in default case (ie, NOT omitTermFreqAndPositions=true) ... then a document with more occurences of a

Re: WDF, automatic phrase queries and omitTermFreqAndPositions

2011-03-14 Thread Markus Jelsma
Ahum, one option would of course not work: copyFielding them to field with positions but the phrase query is executed on fields specified in qf (not pf). And since i need tf=1 in qf, it wouldn't work. I guess extending DefaultSimilarity is the best option, this way i still have position

Re: Results driving me nuts!

2011-03-14 Thread Markus Jelsma
On Monday 14 March 2011 17:27:05 Jonathan Rochkind wrote: Aha. Yeah, I've read the documentation several times,but still find myself confused. But do I understand this right now: If I do omitNorms=true, but still leave term freq and positions in default case (ie, NOT

RE: Using Solr over Lucene effects performance?

2011-03-14 Thread Burton-West, Tom
+1 on some kind of simple performance framework that would allow comparing Solr vs Lucene. Any chance the Lucene benchmark programs in contrib could be adopted to read Solr config information? BTW: You probably want to empty the OS cache in addition to restarting Solr between each run if the

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread lame
We have also commits from application (besides full import) - maybe that is the case. If you don't have any other ideas I'll probably try reindexing second core, than swap cores and run delta import (to import documets added in the meantime). 2011/3/14 Markus Jelsma markus.jel...@openindex.io:

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread Markus Jelsma
Yes, commits from the application will interfere indeed. If your business scenario allows for using always optimized indices you might choose to only replicate on optimize. On Monday 14 March 2011 18:45:15 lame wrote: We have also commits from application (besides full import) - maybe that is

Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello everyone, First of all here is our Solr setup: - Solr nightly build 986158 - Running solr inside the default jetty comes with solr build - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM) - Index replicated (on optimize) to slaves via Solr Replication - Size of

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response times (or time outs). Cheers, Hello everyone, First of all here is our Solr setup: - Solr nightly build 986158 - Running solr inside

Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello, 2011/3/14 Markus Jelsma markus.jel...@openindex.io Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response times (or time outs). How much of a heap size would be enough? Our index size

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Hello, 2011/3/14 Markus Jelsma markus.jel...@openindex.io Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response times (or time outs). How much of a heap size would be enough? Our

Re: problem using dataimporthandler

2011-03-14 Thread sivaram
I know this thread is old but I encountered the exact same problem and couldn't figure out what's wrong. I'm using DIH for SQL Server. Please let me know. And the link that you provided seems to be not exist anymore. Thanks, Ram. -- View this message in context:

Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
I've definitely had cases in 1.4.1 where even though I didn't have an OOM error, Solr was being weirdly slow, and increasing the JVM heap size fixed it. I can't explain why it happened, or exactly how you'd know this was going on, I didn't see anything odd in the logs to indicate, I just

Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello again, 2011/3/14 Markus Jelsma markus.jel...@openindex.io Hello, 2011/3/14 Markus Jelsma markus.jel...@openindex.io Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Nope, no OOM errors. That's a good start! Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting functions. Btw, I am monitoring output via jconsole with 8gb of ram and it still goes to 8gb every 20 seconds or so, gc runs, falls down to 1gb. Hmm, maybe the garbage

Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
It's actually, as I understand it, expected JVM behavior to see the heap rise to close to it's limit before it gets GC'd, that's how Java GC works. Whether that should happen every 20 seconds or what, I don't nkow. Another option is setting better JVM garbage collection arguments, so GC

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
You might also want to add the following switches for your GC log. JAVA_OPTS=$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime Also, what JVM version are you using and

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
That depends on your GC settings and generation sizes. And, instead of UseParallelGC you'd better use UseParNewGC in combination with CMS. See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html It's actually, as I understand it, expected JVM behavior to see the heap rise to close to it's

RE: Different options for autocomplete/autosuggestion

2011-03-14 Thread Robert Petersen
I like field collapsing because that way my suggestions gives phrase results (ie the suggestion starts with what the user has typed so far) and thus I limit suggestions to be in the order of the words typed. I think that looks better for our retail oriented site. I populate the index with

Re: Different options for autocomplete/autosuggestion

2011-03-14 Thread Kai Schlamp
Robert, thanks for your answer. What Solr version do you use? 4.0? As mentioned in my other post here I tried to patch 1.4 for using field collapsing, but couldn't get it to work (compiled fine, but collapsed parameters seems to be completely ignored). 2011/3/14 Robert Petersen rober...@buy.com:

accessing the analyzers in a component?

2011-03-14 Thread Paul Libbrecht
Hello fellow SOLRers, Within my custom query-component, I wish to obtain an instance of the analyzer for a given named field. Is a schema object I can access? thanks in advance paul

RE: Different options for autocomplete/autosuggestion

2011-03-14 Thread Robert Petersen
I am doing this very differently. We are on solr 1.4.0 and I accomplish the collapsing in my wrapper layer. I have written a layer of code around SOLR, an indexer on one end and a search service wrapping solrs on the other end. I manually collapse the field in my code. I keep both a

RE: Different options for autocomplete/autosuggestion

2011-03-14 Thread Robert Petersen
Note that due to the 'raw' nature of my source data I also have to heavily filter my data before collapsing it also. I don't want to suggest garbage phrases just because a lot of people searched on them. We store auxiliary data in the index for filtering on to perform the grouping.

Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello, 2011/3/14 Markus Jelsma markus.jel...@openindex.io That depends on your GC settings and generation sizes. And, instead of UseParallelGC you'd better use UseParNewGC in combination with CMS. JConsole now shows a different profile output but load is still high and performance is still

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock ? I'm not sure, i haven't seen a similar issue in a sharded environment, probably because it was a controlled environment. Hello,

Re: Different options for autocomplete/autosuggestion

2011-03-14 Thread Bill Bell
See how Lucid Enterprise does it... A bit differently. On 3/14/11 12:14 AM, Kai Schlamp kai.schl...@googlemail.com wrote: Hi. There seems to be several options for implementing an autocomplete/autosuggestions feature with Solr. I am trying to summarize those possibilities together with their

Re: Solr 1.4 replication - partial index on slave while indexing master

2011-03-14 Thread Bill Bell
Turn off all autocommitting.. On 3/14/11 7:04 AM, lame l...@o2.pl wrote: Hi guys, I have master slave replication enabled. Slave is replicating every 3 minutes and I encourage problems while I'm performing full import command on master (which takes about 7 minutes). Slave repliacates partial

Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
2011/3/14 Markus Jelsma markus.jel...@openindex.io Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock ? We increased thread limit (which was 1 before) but it did not help. Anyway,

Is WordDelimiterFilterFactory applicable to non-english language?

2011-03-14 Thread cyang2010
Does it make sense to apply WordDelimiterFilterFactory to non-english language, such as spanish? What about asian lanaguage? The following are the typical use case for WordDelimiterFilterFactory. Is 1, 2, 3, and 4 applicable to all wester language (including spanish)? For asian language, is

Re: accessing the analyzers in a component?

2011-03-14 Thread Ahmet Arslan
Within my custom query-component, I wish to obtain an instance of the analyzer for a given named field. Is a schema object I can access? public void process(ResponseBuilder rb) throws IOException { MapString,FieldType map = rb.req.getSchema().getFieldTypes();

Re: Is WordDelimiterFilterFactory applicable to non-english language?

2011-03-14 Thread Ahmet Arslan
Does it make sense to apply WordDelimiterFilterFactory to non-english language, such as spanish?  Yes it makes sense. WDF is especially good for product names; like i-phone, iphone4 etc.

Re: Different options for autocomplete/autosuggestion

2011-03-14 Thread Andy
Can you provide more details? Or a link? --- On Mon, 3/14/11, Bill Bell billnb...@gmail.com wrote: See how Lucid Enterprise does it... A bit differently. On 3/14/11 12:14 AM, Kai Schlamp kai.schl...@googlemail.com wrote: Hi. There seems to be several options for implementing an

Re: Different options for autocomplete/autosuggestion

2011-03-14 Thread Bill Bell
http://lucidworks.lucidimagination.com/display/LWEUG/Spell+Checking+and+Aut omatic+Completion+of+User+Queries For Auto-Complete, find the following section in the solrconfig.xml file for the collection: !-- Auto-Complete component -- searchComponent name=autocomplete

keeping data consistent between Database and Solr

2011-03-14 Thread onlinespend...@gmail.com
Like many people, Solr is not my primary data store. Not all of my data need be searchable and for simple and fast retrieval I store it in a database (Cassandra in my case). Actually I don't have this all built up yet, but my intention is that whenever new data is entered that it be added to my

Re: Different options for autocomplete/autosuggestion

2011-03-14 Thread Kai Schlamp
@Robert: That sounds interesting and very flexible, but also like a lot of work. This approach also doesn't seem to allow querying Solr directly by using Ajax ... one of the big benefits in my opinion when using Solr. @Bill: There are some things I don't like about the Suggester component. It

Re: keeping data consistent between Database and Solr

2011-03-14 Thread Bill Bell
Look at Solandra. Solr + Cassandra. On 3/14/11 9:38 PM, onlinespend...@gmail.com onlinespend...@gmail.com wrote: Like many people, Solr is not my primary data store. Not all of my data need be searchable and for simple and fast retrieval I store it in a database (Cassandra in my case). Actually