Re: Dismax and Grouping query
Thanks, Hoss. It seems that I should think other options. Thanks again. On 9/29/07, Chris Hostetter [EMAIL PROTECTED] wrote: : I've tried to use grouping query on DisMaxRequestHandler without success. : e.g. : When I send query like +(lucene solr), : I can see following line in the result page. : str name=querystring+\(lucene solr\)/str the dismax handler does not consider parens to be special characters. if it did, it's not clear what the semantics would be of a query like... q=A +(B C)qf=X Y Z ..when building the query structure ... what happens if X:B exists and Y:C exists? is that considered a match? Generally, the mm param is used to indicate how many of the query terms (that don't have a + or - prefix) are required, or you can explicitly require/prohibit a term using + or -, but there is no way to require that one of N sub terms is required (prohibiting any of N sub terms is easy, just prohibit them all individually) -Hoss
I18N with SOLR
Hello, Is there anyone who has worked on internationalization with SOLR? Apart from using the dynamicField name=*_eng say for english, is there any other configurations to be made? Regards Dilip
Re: Solr replication
1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can solrconfig.xml to trigger snapshooter after a commit or optimise. 2) On slave: +Edit scripts.conf: solr_hostname=solr.master solr_port=8986 rsyncd_port=18986 data_dir= webapp_name=solr master_host=localhost master_data_dir=$MASTER_SOLR_HOME/data/ master_status_dir=$MASTER_SOLR_HOME/logs/clients/ +Run snappuller: snappuller -P 18983 +Run snapinstaller: snapinstaller You should setup crontab to run snappuller and snapinstaller periodically. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi ! I'm really new to Solr ! Could anybody please explain me with a short example how I can setup a simple Solr replication with 3 machines (a master node and 2 slaves) ? This is my conf: * master (linux 2.6.20) : - Hostname solr.master with IP 192.168.1.1 * 2 slaves (linux 2.6.20) : - Hostname solr.slave1 with IP 192.168.1.2 - Hostname solr.slave2 with IP 192.168.1.3 N.B: sorry if the question was already asked before, but I could't find anything better than the CollectionDistribution on the Wiki. Regards Y. -- Regards, Cuong Hoang
Re: Re: Solr replication
Works like a charm. Thanks very much. cheers Y. Message d'origine Date: Mon, 1 Oct 2007 21:55:30 +1000 De: climbingrose A: solr-user@lucene.apache.org Sujet: Re: Solr replication boundary==_Part_10345_13696775.1191239730731 1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can solrconfig.xml to trigger snapshooter after a commit or optimise. 2) On slave: +Edit scripts.conf: solr_hostname=solr.master solr_port=8986 rsyncd_port=18986 data_dir= webapp_name=solr master_host=localhost master_data_dir=$MASTER_SOLR_HOME/data/ master_status_dir=$MASTER_SOLR_HOME/logs/clients/ +Run snappuller: snappuller -P 18983 +Run snapinstaller: snapinstaller You should setup crontab to run snappuller and snapinstaller periodically. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi ! I'm really new to Solr ! Could anybody please explain me with a short example how I can setup a simple Solr replication with 3 machines (a master node and 2 slaves) ? This is my conf: * master (linux 2.6.20) : - Hostname solr.master with IP 192.168.1.1 * 2 slaves (linux 2.6.20) : - Hostname solr.slave1 with IP 192.168.1.2 - Hostname solr.slave2 with IP 192.168.1.3 N.B: sorry if the question was already asked before, but I could't find anything better than the CollectionDistribution on the Wiki. Regards Y. -- Regards, Cuong Hoang
Searching combined English-Japanese index
Hi, I know there has been quite some discussion about Multilanguage searching already, but I am not quite sure this applies to my case. I have an index with field which contain Japanese and English at the same time. Is this possible? Tokenizing is not the big problem here, the StandardTokenizerFactory is good enough, judging by the Solr-Admin Field Analysis. My problem is, that searches for Japanese Text don't give any results. I get results for the English parts, but not for the Japanese. Using Limo I can see that it is correctly indexed as UTF-8. But using the Solr Admin Query, I don't get any results. As I understood it, Solr should just match the characters and return something. When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) I am using Solr 1.2. Any ideas, what I might be doing wrong? Best regards, Max -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel: (+49) 0711 - 45 10 17 578 Fax: (+49) 0711 - 45 10 17 573 e-mail : [EMAIL PROTECTED] Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Re: Re: Re: Solr replication
One more question about replication. Now that the replication is working, how can I see the changes on slave nodes ? The page statistics : http://solr.slave1:8983/solr/admin/stats.jsp; doesn't reflect the correct number of indexed documents and still shows numDocs=0. Is there any command to tell Solr (on slave node) to sync itself with disk ? cheers Y. Message d'origine De: [EMAIL PROTECTED] A: solr-user@lucene.apache.org Sujet: Re: Re: Solr replication Date: Mon, 1 Oct 2007 15:00:46 +0200 Works like a charm. Thanks very much. cheers Y. Message d'origine Date: Mon, 1 Oct 2007 21:55:30 +1000 De: climbingrose A: solr-user@lucene.apache.org Sujet: Re: Solr replication boundary==_Part_10345_13696775.1191239730731 1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can solrconfig.xml to trigger snapshooter after a commit or optimise. 2) On slave: +Edit scripts.conf: solr_hostname=solr.master solr_port=8986 rsyncd_port=18986 data_dir= webapp_name=solr master_host=localhost master_data_dir=$MASTER_SOLR_HOME/data/ master_status_dir=$MASTER_SOLR_HOME/logs/clients/ +Run snappuller: snappuller -P 18983 +Run snapinstaller: snapinstaller You should setup crontab to run snappuller and snapinstaller periodically. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi ! I'm really new to Solr ! Could anybody please explain me with a short example how I can setup a simple Solr replication with 3 machines (a master node and 2 slaves) ? This is my conf: * master (linux 2.6.20) : - Hostname solr.master with IP 192.168.1.1 * 2 slaves (linux 2.6.20) : - Hostname solr.slave1 with IP 192.168.1.2 - Hostname solr.slave2 with IP 192.168.1.3 N.B: sorry if the question was already asked before, but I could't find anything better than the CollectionDistribution on the Wiki. Regards Y. -- Regards, Cuong Hoang
Re: Searching combined English-Japanese index
On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) One quick thing to try is the python writer (wt=python) to see the actual unicode values of what you are getting back (since the python writer automatically escapes non-ascii). That can help rule out incorrect charset handling by clients. -Yonik
correlation between score and term frequency
Hi! I have a question about the correlation between the score value and the term frequency. Let's assume that we have one index about one set of documents. In addition to that, let's assume that there is only one term in a query. If we now search for the term car and get a certain score value X, and if we then search for the term football and get the same score value X. Is it now sure that both values X are the same? Could you explain, what correlation between the score value and the term frequency exists in my scenario? Thanks for your help! Best regards, alex
Re: Re: Re: Solr replication
sh /bin/commit should trigger a refresh. However, this command should be executed as part of snapinstaller so you should have to run it manually. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: One more question about replication. Now that the replication is working, how can I see the changes on slave nodes ? The page statistics : http://solr.slave1:8983/solr/admin/stats.jsp; doesn't reflect the correct number of indexed documents and still shows numDocs=0. Is there any command to tell Solr (on slave node) to sync itself with disk ? cheers Y. Message d'origine De: [EMAIL PROTECTED] A: solr-user@lucene.apache.org Sujet: Re: Re: Solr replication Date: Mon, 1 Oct 2007 15:00:46 +0200 Works like a charm. Thanks very much. cheers Y. Message d'origine Date: Mon, 1 Oct 2007 21:55:30 +1000 De: climbingrose A: solr-user@lucene.apache.org Sujet: Re: Solr replication boundary==_Part_10345_13696775.1191239730731 1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can solrconfig.xml to trigger snapshooter after a commit or optimise. 2) On slave: +Edit scripts.conf: solr_hostname=solr.master solr_port=8986 rsyncd_port=18986 data_dir= webapp_name=solr master_host=localhost master_data_dir=$MASTER_SOLR_HOME/data/ master_status_dir=$MASTER_SOLR_HOME/logs/clients/ +Run snappuller: snappuller -P 18983 +Run snapinstaller: snapinstaller You should setup crontab to run snappuller and snapinstaller periodically. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi ! I'm really new to Solr ! Could anybody please explain me with a short example how I can setup a simple Solr replication with 3 machines (a master node and 2 slaves) ? This is my conf: * master (linux 2.6.20) : - Hostname solr.master with IP 192.168.1.1 * 2 slaves (linux 2.6.20) : - Hostname solr.slave1 with IP 192.168.1.2 - Hostname solr.slave2 with IP 192.168.1.3 N.B: sorry if the question was already asked before, but I could't find anything better than the CollectionDistribution on the Wiki. Regards Y. -- Regards, Cuong Hoang -- Regards, Cuong Hoang
Re: Re: Re: Re: Solr replication
Perfect. Thanks for all guys. cheers Y. Message d'origine Date: Tue, 2 Oct 2007 01:01:37 +1000 De: climbingrose A: solr-user@lucene.apache.org Sujet: Re: Re: Re: Solr replication boundary==_Part_11644_22377225.1191250897674 sh /bin/commit should trigger a refresh. However, this command should be executed as part of snapinstaller so you should have to run it manually. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: One more question about replication. Now that the replication is working, how can I see the changes on slave nodes ? The page statistics : http://solr.slave1:8983/solr/admin/stats.jsp; doesn't reflect the correct number of indexed documents and still shows numDocs=0. Is there any command to tell Solr (on slave node) to sync itself with disk ? cheers Y. Message d'origine De: [EMAIL PROTECTED] A: solr-user@lucene.apache.org Sujet: Re: Re: Solr replication Date: Mon, 1 Oct 2007 15:00:46 +0200 Works like a charm. Thanks very much. cheers Y. Message d'origine Date: Mon, 1 Oct 2007 21:55:30 +1000 De: climbingrose A: solr-user@lucene.apache.org Sujet: Re: Solr replication boundary==_Part_10345_13696775.1191239730731 1)On solr.master: +Edit scripts.conf: solr_hostname=localhost solr_port=8983 rsyncd_port=18983 +Enable and start rsync: rsyncd-enable; rsyncd-start +Run snapshooter: snapshooter After running this, you should be able to see a new folder named snapshot.* in data/index folder. You can can solrconfig.xml to trigger snapshooter after a commit or optimise. 2) On slave: +Edit scripts.conf: solr_hostname=solr.master solr_port=8986 rsyncd_port=18986 data_dir= webapp_name=solr master_host=localhost master_data_dir=$MASTER_SOLR_HOME/data/ master_status_dir=$MASTER_SOLR_HOME/logs/clients/ +Run snappuller: snappuller -P 18983 +Run snapinstaller: snapinstaller You should setup crontab to run snappuller and snapinstaller periodically. On 10/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi ! I'm really new to Solr ! Could anybody please explain me with a short example how I can setup a simple Solr replication with 3 machines (a master node and 2 slaves) ? This is my conf: * master (linux 2.6.20) : - Hostname solr.master with IP 192.168.1.1 * 2 slaves (linux 2.6.20) : - Hostname solr.slave1 with IP 192.168.1.2 - Hostname solr.slave2 with IP 192.168.1.3 N.B: sorry if the question was already asked before, but I could't find anything better than the CollectionDistribution on the Wiki. Regards Y. -- Regards, Cuong Hoang -- Regards, Cuong Hoang
Major CPU performance problems under heavy user load with solr 1.2
Hi there, I am having some major CPU performance problems with heavy user load with solr 1.2. I currently have approximately 4 million documents in the index and I am doing some pretty heavy faceting on multi-valued columns. I know that doing facets are expensive on multi-valued columns but the CPU seems to max out (400%) with apache bench with just 5 identical concurrent requests and I have the potential for a lot more concurrent requests then that with my large number of users that hit our site per day and I am wondering if there are any workarounds. Currently I am running the out of the box solr solution (Example jetty application with my own schema.xml and solrconfig.xml) on a dual Intel Duo core 64 bit box with 8 gigs of ram allocated to the start.jar process dedicated to solr with no slaves. I have set up some aggressive caching in the solrconfig.xml for the filtercache (class=solr.LRUCachesize=300 initialSize=200) and have the HashDocSet set to 1 to help with faceting, but still I am getting some pretty poor performance. I have also tried autowarming the facets by performing a query that hits all my multivalued facets with no facet limits across all the documents in the index. This does seem to reduce my query times by a lot because the filtercache grows to about 2.1 million lookups and finishes the query in about 70 secs. However I have noticed an issue with this because each time I do an optimize or a commit after prewarming the facets the cache gets cleared, according to the stats on the admin page, but the RSize does not shink for the process, and the queries get slow again, so I prewarm the facets again and the memory usage keeps growing like the cache is not being recycled and as a results the prewarm query starts to get slower and slower as each time this occurs (after about 5 times of prewarms and then commit the query takes about 30 mins... ugh) and almost run out of memory. Any thoughts on how to help improve this and fix the memory issue? -- View this message in context: http://www.nabble.com/Major-CPU-performance-problems-under-heavy-user-load-with-solr-1.2-tf4549093.html#a12981540 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Major CPU performance problems under heavy user load with solr 1.2
On 10/1/07, Robert Purdy [EMAIL PROTECTED] wrote: Hi there, I am having some major CPU performance problems with heavy user load with solr 1.2. I currently have approximately 4 million documents in the index and I am doing some pretty heavy faceting on multi-valued columns. I know that doing facets are expensive on multi-valued columns but the CPU seems to max out (400%) with apache bench with just 5 identical concurrent requests One can always max out CPU (unless one is IO bound) with concurrent requests greater than the number of CPUs on the system. This isn't a problem by itself and would exist even if Solr were an order of magnitude slower or faster. You should be looking at things the peak throughput (queries per sec) you need to support and the latency of the requests (look at the 90 percentile, or whatever). and I have the potential for a lot more concurrent requests then that with my large number of users that hit our site per day and I am wondering if there are any workarounds. Currently I am running the out of the box solr solution (Example jetty application with my own schema.xml and solrconfig.xml) on a dual Intel Duo core 64 bit box with 8 gigs of ram allocated to the start.jar process dedicated to solr with no slaves. I have set up some aggressive caching in the solrconfig.xml for the filtercache (class=solr.LRUCachesize=300 initialSize=200) and have the HashDocSet set to 1 to help with faceting, but still I am getting some pretty poor performance. I have also tried autowarming the facets by performing a query that hits all my multivalued facets with no facet limits across all the documents in the index. This does seem to reduce my query times by a lot because the filtercache grows to about 2.1 million lookups and finishes the query in about 70 secs. OK, that's long. So focus on the latency of a single request instead of jumping straight to load testing. 2.1 million is a lot - what's the field with the largest number of unique values that you are faceting on? However I have noticed an issue with this because each time I do an optimize or a commit after prewarming the facets the cache gets cleared, according to the stats on the admin page, but the RSize does not shink for the process, and the queries get slow again, so I prewarm the facets again and the memory usage keeps growing like the cache is not being recycled The old searcher and cache won't be discarded until all requests using it have completed. and as a results the prewarm query starts to get slower and slower as each time this occurs (after about 5 times of prewarms and then commit the query takes about 30 mins... ugh) and almost run out of memory. Any thoughts on how to help improve this and fix the memory issue? You could try the minDf param to reduce the number of facets stored in the cache and reduce memory consumption. -Yonik
Re: Searching combined English-Japanese index
On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: Yonik Seeley schrieb: On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) One quick thing to try is the python writer (wt=python) to see the actual unicode values of what you are getting back (since the python writer automatically escapes non-ascii). That can help rule out incorrect charset handling by clients. -Yonik Thanks for the tip, it turns out that the unicode values are wrong... I mean the browser displays correctly what is send. But I don't know how solr gets these values. OK, so they never got into the index correctly. The most likely explanation is that the charset wasn't set correctly when the update message was sent to Solr. -Yonik
Re: correlation between score and term frequency
Hi Alex, do you mean, you like to know if both results have the same relevance through the whole content which is indexed and if both results are direct comparable? [EMAIL PROTECTED] schrieb: I have a question about the correlation between the score value and the term frequency. Let's assume that we have one index about one set of documents. In addition to that, let's assume that there is only one term in a query. If we now search for the term car and get a certain score value X, and if we then search for the term football and get the same score value X. Is it now sure that both values X are the same? Could you explain, what correlation between the score value and the term frequency exists in my scenario?
Re: Letter-number transitions - can this be turned off
On 30-Sep-07, at 12:47 PM, F Knudson wrote: Is there a flag to disable the letter-number transition in the solr.WordDelimiterFilterFactory? We are indexing category codes, thesaurus codes for which this letter number transition makes no sense. It is bloating the indexing (which is already large). Have you considered using a different analyzer? If you want to continue using WDF, you could make a quick change around since 320: if (splitOnCaseChange == 0 (lastType ALPHA) != 0 (type ALPHA) != 0) { // ALPHA-ALPHA: always ignore if case isn't considered. } else if ((lastType UPPER)!=0 (type LOWER)!=0) { // UPPER-LOWER: Don't split } else { ... by adding a clause that catches ALPHA - NUMERIC (and vice versa) and ignores it. Another approach that I am using locally is to maintain the transitions, but force tokens to be a minimum size (so r2d2 doesn't tokenize to four tokens but arrrdeee does). There is a patch here: http://issues.apache.org/jira/browse/SOLR-293 If you vote for it, I promise to get it in for 1.3 g -Mike
Re: correlation between score and term frequency
On 1-Oct-07, at 7:06 AM, [EMAIL PROTECTED] wrote: Hi! I have a question about the correlation between the score value and the term frequency. Let's assume that we have one index about one set of documents. In addition to that, let's assume that there is only one term in a query. If we now search for the term car and get a certain score value X, and if we then search for the term football and get the same score value X. Is it now sure that both values X are the same? Could you explain, what correlation between the score value and the term frequency exists in my scenario? If the field has norms, there is a corrolation but the tf is unrecoverable from the score, because of field length normalization. query normalization also makes it difficult to compare scores from query to query. see http://lucene.apache.org/java/docs/scoring.html to start out, in particular the link to the Similarity class javadocs. -Mike
RE: Searching combined English-Japanese index
Some servlet containers don't do UTF-8 out of the box. There is information about this on the wiki. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, October 01, 2007 9:45 AM To: solr-user@lucene.apache.org Subject: Re: Searching combined English-Japanese index On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: Yonik Seeley schrieb: On 10/1/07, Maximilian Hütter [EMAIL PROTECTED] wrote: When I search using an English term, I get results but the Japanese is not encoded correctly in the response. (although it is UTF-8 encoded) One quick thing to try is the python writer (wt=python) to see the actual unicode values of what you are getting back (since the python writer automatically escapes non-ascii). That can help rule out incorrect charset handling by clients. -Yonik Thanks for the tip, it turns out that the unicode values are wrong... I mean the browser displays correctly what is send. But I don't know how solr gets these values. OK, so they never got into the index correctly. The most likely explanation is that the charset wasn't set correctly when the update message was sent to Solr. -Yonik
Questions about unit test assistant TestHarness
Hi- Is anybody using the unit test assistant class TestHarness in Solr 1.2? I'm trying to use it in Eclipse and found a few problems with classloading. These might be a quirk of using it with Eclipse. I also found a bug in the commit() function where '(Object)' should be '(Object[])'. Are all of these problems fixed in the Solr 1.3 trunk? Should I just grab whatever's there and use them with 1.2? Thanks, Lance Norskog