Re: how to disable segmentation when querying?
Field中的文本必须进行分析后才能生成索引 On Thu, Jun 28, 2012 at 12:02 PM, Sheng LUO sheng.peisi@gmail.com wrote: Hi there, how can I disable segmentation when querying? I tried to delete analyzer type=query.../analyzer from schema.xml. But it will use default analyzer instead. Any ideas? Thanks
Re: Solr seems to hang
It now hanging for 15 hour and nothing changes in the index directory. Tips for further debugging? On 06/27/2012 03:50 PM, Arkadi Colson wrote: I'm sending files to solr with the php Solr library. I'm doing a commit every 1000 documents: autoCommit maxDocs1000/maxDocs !-- maxTime1000/maxTime -- /autoCommit Hard to say how long it's hanging. At least for 1 hour. After that I restarted Tomcat to continue... I will have a look at the indexes next time it's hanging. Thanks for the tip! SOLR: 3.6 TOMCAT: 7.0.28 JAVA: 1.7.0_05-b05 On 06/27/2012 03:13 PM, Erick Erickson wrote: How long is it hanging? And how are you sending files to Tika, and especially how often do you commit? One problem that people run into is that they commit too often, causing segments to be merged and occasionally that just takes a while and people think that Solr is hung. 18G isn't very large as indexes go, so it's unlikely that's your problem, except if merging is going on in which case you might be copying a bunch of data. So try seeing if you're getting a bunch of disk activity, you can get a crude idea of what's going on if you just look at the index directory on your Solr server while it's hung. What version of Solr are you using? Details matter Best Erick On Wed, Jun 27, 2012 at 7:51 AM, Arkadi Colson ark...@smartbit.be wrote: Anybody an idea? The thread Dump looks like this: Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode): http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in Object.wait() [0x7fa0ad197000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070abf4ad0 (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at java.lang.Object.wait(Object.java:485) at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458) - locked 0x00070abf4ad0 (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484) at java.lang.Thread.run(Thread.java:662) pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on condition [0x7f9f962f4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000702598b30 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.DelayQueue.take(DelayQueue.java:160) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable [0x7f9f94171000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:735) at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:814) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in Object.wait() [0x7f9f966c9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at java.lang.Object.wait(Object.java:485) at org.apache.lucene.index.DocumentsWriter.waitIdle(DocumentsWriter.java:986) - locked 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:524) - locked 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3580) - locked 0x00070b6e4858 (a org.apache.solr.update.SolrIndexWriter) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3545) at
Re: Solr seems to hang
could you please use jstack to dump the call stacks? On Thu, Jun 28, 2012 at 2:53 PM, Arkadi Colson ark...@smartbit.be wrote: It now hanging for 15 hour and nothing changes in the index directory. Tips for further debugging? On 06/27/2012 03:50 PM, Arkadi Colson wrote: I'm sending files to solr with the php Solr library. I'm doing a commit every 1000 documents: autoCommit maxDocs1000/maxDocs !-- maxTime1000/maxTime -- /autoCommit Hard to say how long it's hanging. At least for 1 hour. After that I restarted Tomcat to continue... I will have a look at the indexes next time it's hanging. Thanks for the tip! SOLR: 3.6 TOMCAT: 7.0.28 JAVA: 1.7.0_05-b05 On 06/27/2012 03:13 PM, Erick Erickson wrote: How long is it hanging? And how are you sending files to Tika, and especially how often do you commit? One problem that people run into is that they commit too often, causing segments to be merged and occasionally that just takes a while and people think that Solr is hung. 18G isn't very large as indexes go, so it's unlikely that's your problem, except if merging is going on in which case you might be copying a bunch of data. So try seeing if you're getting a bunch of disk activity, you can get a crude idea of what's going on if you just look at the index directory on your Solr server while it's hung. What version of Solr are you using? Details matter Best Erick On Wed, Jun 27, 2012 at 7:51 AM, Arkadi Colson ark...@smartbit.be wrote: Anybody an idea? The thread Dump looks like this: Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode): http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in Object.wait() [0x7fa0ad197000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070abf4ad0 (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at java.lang.Object.wait(Object.java:485) at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458) - locked 0x00070abf4ad0 (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484) at java.lang.Thread.run(Thread.java:662) pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on condition [0x7f9f962f4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000702598b30 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.DelayQueue.take(DelayQueue.java:160) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable [0x7f9f94171000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:735) at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:814) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in Object.wait() [0x7f9f966c9000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at java.lang.Object.wait(Object.java:485) at org.apache.lucene.index.DocumentsWriter.waitIdle(DocumentsWriter.java:986) - locked 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:524) - locked 0x00070b6e4790 (a org.apache.lucene.index.DocumentsWriter) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3580) - locked 0x00070b6e4858 (a
RE: what is precisionStep and positionIncrementGap
Thanks a lot, but the precisionStep is still very vague to me! Could you give me a example? -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 2012年6月28日 11:25 To: solr-user@lucene.apache.org Subject: Re: what is precisionStep and positionIncrementGap 1. precisionStep is used for ranging query of Numeric Fields. see http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html 2. positionIncrementGap is used for phrase query of multi-value fields e.g. doc1 has two titles. title1: ab cd title2: xy zz if your positionIncrementGap is 0, then the position of the 4 terms are 0,1,2,3. if you search phrase cd xy, it will hit. But you may think it should not match so you can adjust positionIncrementGap to a larger one. e.g. 100. Then the positions now are 0,1,100,101. the phrase query will not match it. On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Hi, in the schema.xml, usually there will be fieldType definition like this: fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ the precisionStep and positionIncrementGap is not very clear to me. Could you please elaborate more on these 2? Thanks! Liang
Re: what is precisionStep and positionIncrementGap
read How it works of http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html if you can read Chinese, I have a blog explaining the details of the implementation. http://blog.csdn.net/fancyerii/article/details/7256379 On Thu, Jun 28, 2012 at 3:51 PM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Thanks a lot, but the precisionStep is still very vague to me! Could you give me a example? -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 2012年6月28日 11:25 To: solr-user@lucene.apache.org Subject: Re: what is precisionStep and positionIncrementGap 1. precisionStep is used for ranging query of Numeric Fields. see http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html 2. positionIncrementGap is used for phrase query of multi-value fields e.g. doc1 has two titles. title1: ab cd title2: xy zz if your positionIncrementGap is 0, then the position of the 4 terms are 0,1,2,3. if you search phrase cd xy, it will hit. But you may think it should not match so you can adjust positionIncrementGap to a larger one. e.g. 100. Then the positions now are 0,1,100,101. the phrase query will not match it. On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Hi, in the schema.xml, usually there will be fieldType definition like this: fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ the precisionStep and positionIncrementGap is not very clear to me. Could you please elaborate more on these 2? Thanks! Liang
RE: what is precisionStep and positionIncrementGap
看了你的blog,写得相当不错啊! 我有一个网站www.ecmkit.com,专注内容管理的。有空多交流! -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 2012年6月28日 15:54 To: solr-user@lucene.apache.org Subject: Re: what is precisionStep and positionIncrementGap read How it works of http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html if you can read Chinese, I have a blog explaining the details of the implementation. http://blog.csdn.net/fancyerii/article/details/7256379 On Thu, Jun 28, 2012 at 3:51 PM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Thanks a lot, but the precisionStep is still very vague to me! Could you give me a example? -Original Message- From: Li Li [mailto:fancye...@gmail.com] Sent: 2012年6月28日 11:25 To: solr-user@lucene.apache.org Subject: Re: what is precisionStep and positionIncrementGap 1. precisionStep is used for ranging query of Numeric Fields. see http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/al l/org/apache/lucene/search/NumericRangeQuery.html 2. positionIncrementGap is used for phrase query of multi-value fields e.g. doc1 has two titles. title1: ab cd title2: xy zz if your positionIncrementGap is 0, then the position of the 4 terms are 0,1,2,3. if you search phrase cd xy, it will hit. But you may think it should not match so you can adjust positionIncrementGap to a larger one. e.g. 100. Then the positions now are 0,1,100,101. the phrase query will not match it. On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F liang.f.zh...@alcatel-sbell.com.cn wrote: Hi, in the schema.xml, usually there will be fieldType definition like this: fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ the precisionStep and positionIncrementGap is not very clear to me. Could you please elaborate more on these 2? Thanks! Liang
writing unit test for a search component which works only with distributed search
I have written a custom search component, but the code only supports distributed search. Since we dont use non-distributed search and the search works differently for non-distributed case, we decided to concentrate only on distributed search. I am trying to write unit test for my custom component. I could see that BaseDistributedSearchTestCase's query method compares the result of a single sharded control and multisharded test. I cannot use this method as my component would work only for multisharded search. Ideally I would like to use something like SolrTestCaseJ4's assertQ method, where i can use an xpath expression to validate the results. Does SolrTestCaseJ4 already support distributed search or do I need to customize it ? Or is there any other way to write unit test case that would work for distributed-only cases ? Thanks for your help! Srini -- View this message in context: http://lucene.472066.n3.nabble.com/writing-unit-test-for-a-search-component-which-works-only-with-distributed-search-tp3991795.html Sent from the Solr - User mailing list archive at Nabble.com.
searching for more then one word
Hi I indexed following strings: abcdefg hijklmnop When searching for abcdefg hijklmnop Solr returns the result but when searching for abcdefg hijklmnop Solr returns nothing. Any idea how to search for more then one word? [params] = SolrObject Object ( [debugQuery] = true [shards] = solr03-gs.intnet.smartbit.be:8983/solr,solr04-gs.intnet.smartbit.be:8983/solr,solr03-dcg.intnet.smartbit.be:8983/solr,solr04-dcg.intnet.smartbit.be:8983/solr [fl] = id,smsc_module,smsc_modulekey,smsc_userid,smsc_ssid,smsc_description,smsc_content,smsc_courseid,smsc_lastdate,score [indent] = on [start] = 0 [q] = (smsc_content:abcdefg hijklmnop || smsc_description:abcdefg hijklmnop) (smsc_lastdate:[2008-05-28T08:45:50Z TO 2012-06-28T08:45:50Z]) [distrib] = true [wt] = xml [version] = 2.2 [rows] = 50 ) fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Dutch / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/-- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,stopwords_du.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Dutch / /analyzer /fieldType Thanks! -- Smartbit bvba Hoogstraat 13 B-3670 Meeuwen T: +32 11 64 08 80 F: +32 89 46 81 10 W: http://www.smartbit.be E: ark...@smartbit.be
LineEntityProcessor Usage
Hello, I have a question regarding configuration of LineEntityProcessor. How do we configure LineEntityProcessor to read a line of text from a file,parse the line and assign it to specific fields in schema. How exactly is the text in a line gets mapped to fields in schema. I have searched a lot and didn't find any example of how to do that. Can somebody please give me an example of how to do this.Also please help me in understanding the concept of lineEntityProcessor. Thanks Regards, Kiran Bushireddy
Re: SolrJ Response
use simple java api for url connection and print the stream u received... On Thu, Jun 28, 2012 at 2:24 PM, Shanu Jha shanuu@gmail.com wrote: Hi, I am getting solr document after querying solr using solrJ. I think solrj parse the response to generate its own response onject. I want to display response in json/xmk as it comes from solr. Please help. AJ -- Thanks Regards Sachin Aggarwal 7760502772
Re: SolrJ Response
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I want to display response in json/xmk as it comes from solr. Why don't you use the JSON QueryResponseWriter from Solr directly? http://wiki.apache.org/solr/SolJSON should give you all you need to get started. Jochen - -- Jochen Just Fon: (++49) 711/28 07 57-193 avono AG Mobil: (++49) 172/73 85 387 Breite Straße 2 Mail: jochen.j...@avono.de 70173 Stuttgart WWW: http://www.avono.de -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJP7FG7AAoJEP1xbhgWUHmSJOQP/AyKfHI1bOkmcgKL0PVAnQu9 sCdAUWhL732JWd+TT77onnQQ2g2vKWnIg+00fmU+x5B52uUzU2nKRMCLaVhGlSJK NSb4c5DJVcdzz6G5fofkQLZKahLRSi9d3p8A5c5CMSEvkWLAYR3OPrNTn7dUiJNA D1JyQjSbMMwyf41msjLF84oF4C4Nb+0eY2bqiF5rlMBdEzazYl4hlMkVxzu6taiQ Yf38CB+vd91OznpvMTr89XCuTi+l9XmG0d0TKKvKq4r2sDTrQyBM8q3oyTPeNyKy VsmUP+m6kqlPWOlSjJyxw5PQz5IlfwRskTbrMS4ZCBDH7Bam5D0UtZzuY+DJRKCM eW49MLgbA2IPYnvfd78v+VxCj9Xyh49QZd0ea1uXve7ABp7WeRj/1L8CdHvAK6/k 5NFW02/A+PoI3+QTgcYzXaO5N+AG3maAhLELDgZ1fQW/wISRLSBeSRj7QEQRPLJE rpekf7v3S0fBJyk2cn7YITTuqMogwktVYv/OQ6wB7+1O8cXzt6p4BYRneqmPw4Ll 6Vr/ESdGMTOu7VAzWqB9pmCMjfORtqKIFIVcGyIAGFlD5xhH3aepM2bpbCVJnaMM GYnXoLLSB2mrexyccthBQV+sYOvZerjtcvoBY5ZIxcbT+HBag6ReFrLxM3AUWmLI jtGvkh4y1180l2AKgSSJ =QsJw -END PGP SIGNATURE-
WordBreakSolrSpellChecker ignores MinBreakWordLength?
I set MinBreakWordLength = 3 thinking it would prevent WordBreakSolrSpellChecker from suggesting corrections made up of subwords shorter than 3 characters, but I still get suggestions like this: query: Touch N' Match suggestion: (t o u ch) 'n (m a t ch) Can someone help me understand why? Here is the relevant portion of solrconfig.xml: str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations15/str str name=spellcheck.maxCollationTries100/str str name=spellcheck.alternativeTermCount4/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.MinBreakWordLength3/str str name=spellcheck.maxChanges3/str
Re: Query Logic Question
Jack, Thank you the *:* solutions seems to work. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Logic-Question-tp3991689p3991881.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: LineEntityProcessor Usage
LineEntityProcessor outputs the entire line in a field called rawLine. You then need to write a transformer that will parse out the data. But see https://issues.apache.org/jira/browse/SOLR-2549 for enhancements that will parse the data without needing a transformer, if the data is in fixed-width or delimited format. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: kiran kumar [mailto:kirankumarsm...@gmail.com] Sent: Wednesday, June 27, 2012 11:03 PM To: solr-user@lucene.apache.org Subject: LineEntityProcessor Usage Hello, I have a question regarding configuration of LineEntityProcessor. How do we configure LineEntityProcessor to read a line of text from a file,parse the line and assign it to specific fields in schema. How exactly is the text in a line gets mapped to fields in schema. I have searched a lot and didn't find any example of how to do that. Can somebody please give me an example of how to do this.Also please help me in understanding the concept of lineEntityProcessor. Thanks Regards, Kiran Bushireddy
RE: WordBreakSolrSpellChecker ignores MinBreakWordLength?
Carrie, Try taking the workbreak parameters out of the request handler configuration and instead put them in the spellchecker configuration. You also need to remove the spellcheck. prefix. Also, the correct spelling for this parameter is minBreakLength. Here's an example. lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=field{your field name here}/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges3/int int name=minBreakLength3/int /lst All of the parameters in the following source file go in the spellchecker configuration like this: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java Descriptions of each of these parameters can be found in this source file: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java Let me know if this works out for you. Any more feedback you can provide on the newer spellcheck features you're using is appreciated. Thanks. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Carrie Coy [mailto:c...@ssww.com] Sent: Thursday, June 28, 2012 8:20 AM To: solr-user@lucene.apache.org Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength? I set MinBreakWordLength = 3 thinking it would prevent WordBreakSolrSpellChecker from suggesting corrections made up of subwords shorter than 3 characters, but I still get suggestions like this: query: Touch N' Match suggestion: (t o u ch) 'n (m a t ch) Can someone help me understand why? Here is the relevant portion of solrconfig.xml: str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations15/str str name=spellcheck.maxCollationTries100/str str name=spellcheck.alternativeTermCount4/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.MinBreakWordLength3/str str name=spellcheck.maxChanges3/str
Re: how to disable segmentation when querying?
谢谢回复,这个我明白,已经找到方法在查询时不分词了,使用solr.whitespacetokenizerfactory或者solr.keywordstokenizerfactory就可以做到了 2012/6/28 wangjing ppm10...@gmail.com Field中的文本必须进行分析后才能生成索引 On Thu, Jun 28, 2012 at 12:02 PM, Sheng LUO sheng.peisi@gmail.com wrote: Hi there, how can I disable segmentation when querying? I tried to delete analyzer type=query.../analyzer from schema.xml. But it will use default analyzer instead. Any ideas? Thanks
edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc)
Hello, My previous e-mail with a CJK example has received no replies. I verified that this problem also occurs for English. For example in the case of the word fire-fly , The ICUTokenizer and the WordDelimeterFilter both split this into two tokens fire and fly. With an edismax query and a must match of 2 : q={!edsmax mm=2} if the words are entered separately at [fire fly], the edismax parser honors the mm parameter and does the equivalent of a Boolean AND query. However if the words are entered as a hypenated word [fire-fly], the tokenizer splits these into two tokens fire and fly and the edismax parser does the equivalent of a Boolean OR query. I'm not sure I understand the output of the debugQuery, but judging by the number of hits returned it appears that edismax is not honoring the mm parameter. Am I missing something, or is this a bug? I'd like to file a JIRA issue, but want to find out if I am missing something here. Details of several queries are appended below. Tom Burton-West edismax query mm=2 query with hypenated word [fire-fly] lst name=debug str name=rawquerystring{!edismax mm=2}fire-fly/str str name=querystring{!edismax mm=2}fire-fly/str str name=parsedquery+DisjunctionMaxQuery(((ocr:fire ocr:fly)))/str str name=parsedquery_toString+((ocr:fire ocr:fly))/str Entered as separate words [fire fly] numFound=184962 edismax mm=2 lst name=debug str name=rawquerystring{!edismax mm=2}fire fly/str str name=querystring{!edismax mm=2}fire fly/str str name=parsedquery +((DisjunctionMaxQuery((ocr:fire)) DisjunctionMaxQuery((ocr:fly)))~2) /str Regular Boolean AND query: [fire AND fly] numFound=184962 str name=rawquerystringfire AND fly/str str name=querystringfire AND fly/str str name=parsedquery+ocr:fire +ocr:fly/str str name=parsedquery_toString+ocr:fire +ocr:fly/str Regular Boolean OR query: fire OR fly 366047 numFound=366047 lst name=debug str name=rawquerystringfire OR fly/str str name=querystringfire OR fly/str str name=parsedqueryocr:fire ocr:fly/str str name=parsedquery_toStringocr:fire ocr:fly/str
Re: searching for more then one word
The analysis page is your best friend in these circumstances. Use the analysis page in solr admin and turn verbose output for both index and query and see what the analysis chain looks like. You maybe able to find the culprit. On Thu, Jun 28, 2012 at 10:57 AM, Arkadi Colson ark...@smartbit.be wrote: Hi I indexed following strings: abcdefg hijklmnop When searching for abcdefg hijklmnop Solr returns the result but when searching for abcdefg hijklmnop Solr returns nothing. Any idea how to search for more then one word? [params] = SolrObject Object ( [debugQuery] = true [shards] = solr03-gs.intnet.smartbit.be:** 8983/solr,solr04-gs.intnet.**smartbit.be:8983/solr,solr03-** dcg.intnet.smartbit.be:8983/**solr,solr04-dcg.intnet.** smartbit.be:8983/solrhttp://solr03-gs.intnet.smartbit.be:8983/solr,solr04-gs.intnet.smartbit.be:8983/solr,solr03-dcg.intnet.smartbit.be:8983/solr,solr04-dcg.intnet.smartbit.be:8983/solr [fl] = id,smsc_module,smsc_modulekey,** smsc_userid,smsc_ssid,smsc_**description,smsc_content,smsc_** courseid,smsc_lastdate,score [indent] = on [start] = 0 [q] = (smsc_content:abcdefg hijklmnop || smsc_description:abcdefg hijklmnop) (smsc_lastdate:[2008-05-28T08:**45:50Z TO 2012-06-28T08:45:50Z]) [distrib] = true [wt] = xml [version] = 2.2 [rows] = 50 ) fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**HTMLStripCharFilterFactory/ tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,**stopwords_du.txt enablePositionIncrements=** true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=Dutch / /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory/ filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/-- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt,**stopwords_du.txt enablePositionIncrements=** true/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=Dutch / /analyzer /fieldType Thanks! -- Smartbit bvba Hoogstraat 13 B-3670 Meeuwen T: +32 11 64 08 80 F: +32 89 46 81 10 W: http://www.smartbit.be E: ark...@smartbit.be
LineEntityProcessor Usage
Hello, I have a question regarding configuration of LineEntityProcessor. How do we configure LineEntityProcessor to read a line of text from a file,parse the line and assign it to specific fields in schema. How exactly is the text in a line gets mapped to fields in schema. I have searched a lot and didn't find any example of how to do that. Can somebody please give me an example of how to do this.Also please help me in understanding the concept of lineEntityProcessor. Thanks Regards, Kiran Bushireddy
Re: Solved: WordBreakSolrSpellChecker ignores MinBreakWordLength?
Thanks! The combination of these two suggestions (relocating the wordbreak parameters to the spellchecker configuration and correcting the spelling of the parameter to minBreakLength) fixed the problem I was having. On 06/28/2012 10:22 AM, Dyer, James wrote: Carrie, Try taking the workbreak parameters out of the request handler configuration and instead put them in the spellchecker configuration. You also need to remove the spellcheck. prefix. Also, the correct spelling for this parameter is minBreakLength. Here's an example. lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=field{your field name here}/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges3/int int name=minBreakLength3/int /lst All of the parameters in the following source file go in the spellchecker configuration like this: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java Descriptions of each of these parameters can be found in this source file: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java Let me know if this works out for you. Any more feedback you can provide on the newer spellcheck features you're using is appreciated. Thanks. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Carrie Coy [mailto:c...@ssww.com] Sent: Thursday, June 28, 2012 8:20 AM To: solr-user@lucene.apache.org Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength? I set MinBreakWordLength = 3 thinking it would prevent WordBreakSolrSpellChecker from suggesting corrections made up of subwords shorter than 3 characters, but I still get suggestions like this: query: Touch N' Match suggestion: (t o u ch) 'n (m a t ch) Can someone help me understand why? Here is the relevant portion of solrconfig.xml: str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.count10/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations15/str str name=spellcheck.maxCollationTries100/str str name=spellcheck.alternativeTermCount4/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.MinBreakWordLength3/str str name=spellcheck.maxChanges3/str
Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
Hi All, I am facing an ecpetion while trying to use dataImportHandler for Indexing My solrcofig.xml help is:- ?xml version=1.0 encoding=UTF-8 ? config abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError luceneMatchVersionLUCENE_36/luceneMatchVersion lib dir=../../dist/ regex=apache-solr-dataimporthandler-d.*.jar / directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ updateHandler class=solr.DirectUpdateHandler2 / requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=false / /requestDispatcher requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.JsonUpdateRequestHandler startup=lazy / requestHandler name=/admin/ class=solr.admin.AdminHandlers / requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtsearch/str str name=qsolrpingquery/str /lst lst name=defaults str name=echoParamsall/str /lst /requestHandler requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler admin defaultQuerysolr/defaultQuery /admin /config and Jar's name is apache-solr-dataimporthandler-3.6.0.jar Please revert if someone has the solution to it. Regards Rohit -- View this message in context: http://lucene.472066.n3.nabble.com/Error-loading-class-org-apache-solr-handler-dataimport-DataImportHandler-tp3991940.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LineEntityProcessor Usage
It creates one field with the line as a string. 'rawLine' http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/solr/DataImportHandler To make other fields from the contents of the string, you can use the RegexTransformer to pull text out of the string. On Thu, Jun 28, 2012 at 9:55 AM, kiran kumar kirankumarsm...@gmail.com wrote: Hello, I have a question regarding configuration of LineEntityProcessor. How do we configure LineEntityProcessor to read a line of text from a file,parse the line and assign it to specific fields in schema. How exactly is the text in a line gets mapped to fields in schema. I have searched a lot and didn't find any example of how to do that. Can somebody please give me an example of how to do this.Also please help me in understanding the concept of lineEntityProcessor. Thanks Regards, Kiran Bushireddy -- Lance Norskog goks...@gmail.com
Re: Autocomplete using facets
Ugo, I suggest simply manually filtering out red from the facet.prefix results you get back. Not ideal, but it's easy and your problem seems like an infrequent event and a minor nuisance. ~ David Smiley p.s. thanks for buying my book - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-using-facets-tp3991377p3991953.html Sent from the Solr - User mailing list archive at Nabble.com.
index writer in searchComponent
Hi Is it possible to add a new document to the index in a custom SearchComponent (that also implements a SolrCoreAware)? I can get a reference to the indexReader via the ResponseBuilder parameter of the process() method using rb.req.getSearcher().getReader() But is it possible to actually add a new document to the index _after_ searching the index? I.e accessing the indexWriter? thank you Peyman
How do we use HTMLStripCharFilterFactory
Hi All, I am new to SOLR. Please hellp me with configuration of HTMLStripCharFilterFactory. If some tutorial is there, will be of great help. Regards Rohit -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-we-use-HTMLStripCharFilterFactory-tp3991955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: soft commits in EmbeddedSolrServer
Yes, This worked for me: //Solr Server initialization System.setProperty(solr.solr.home, solrHome); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); coreContainer = initializer.initialize(); server = new EmbeddedSolrServer(coreContainer, your_corename); //Create your SolrInputDocument doc ... //Soft commit UpdateRequest req = new UpdateRequest(); req.setAction(ACTION.COMMIT, false, false, true); req.add( doc ); UpdateResponse rsp = req.process( server ); Regards, Raimon Bosch. 2012/6/26 Mark Miller markrmil...@gmail.com Yes - just pass the param same as you would if not using embedded On Jun 25, 2012, at 4:40 PM, Raimon Bosch wrote: Old question but I'm still wondering if this is possible. I'm using Solr 4.0. Can I use the EmbeddedSolrServer to perform soft commits? 2011/9/16 Raimon Bosch raimon.bo...@gmail.com Hi all, I'm checking how to do soft commits with the new version of Solr. I'm using EmbeddedSolrServer to add documents to my index. How can I perform a soft commit using this class? Is it possible? Or should I use the trunk? http://wiki.apache.org/solr/NearRealtimeSearch http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html Thanks in advance, Raimon Bosch. - Mark Miller lucidimagination.com
avgTimePerRequest JMX M-Bean displays with NaN instead of 0 - when no activity
hello all, environment: solr 3.5, jboss, wily we have been setting up jmx monitoring for our solr installation. while running tests - i noticed that of the 6 JMX M-Beans (avgRequestsPerSecond, avgTimePerRequest, errors, requests, timeouts, totalTime) ... the avgTimePerRequest M-Bean was producing NaN when there was no search activity. all of the other M-Beans displayed a 0 (zero) when there was no search activity. we were able to compensate for this issue with custom scripting in wily on our side. can someone help me understand this inconsistency? is this just a WAD (works as a designed) ? thanks for any help or insight -- View this message in context: http://lucene.472066.n3.nabble.com/avgTimePerRequest-JMX-M-Bean-displays-with-NaN-instead-of-0-when-no-activity-tp3991962.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FileNotFoundException during commit. concurrences process?!
On Jun 26, 2012, at 7:35 AM, stockii wrote: Hello again. this is my Exception. with SolrVersion: 4.0.0.2012.04.26.09.00.41 SEVERE: Exception while solr commit. java.io.FileNotFoundException: _8l.cfs at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:266) at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:216) at org.apache.lucene.index.TieredMergePolicy.size(TieredMergePolicy.java:640) at org.apache.lucene.index.TieredMergePolicy.useCompoundFile(TieredMergePolicy.java:616) at org.apache.lucene.index.IndexWriter.useCompoundFile(IndexWriter.java:2078) at org.apache.lucene.index.IndexWriter.prepareFlushedSegment(IndexWriter.java:1968) at org.apache.lucene.index.DocumentsWriter.publishFlushedSegment(DocumentsWriter.java:497) at org.apache.lucene.index.DocumentsWriter.finishFlush(DocumentsWriter.java:477) at org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:201) at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:119) at org.apache.lucene.index.DocumentsWriterFlushQueue.tryPurge(DocumentsWriterFlushQueue.java:148) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:438) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:553) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2416) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2548) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2530) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:783) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:286) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:246) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:404) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:443) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:422) Jun 26, 2012 4:28:05 PM org.apache.solr.handler.dataimport.SimplePropertiesWriter readIndexerProperties My Architecture is. 2 Solr Instances. One Instance update a index (updater, and another Instance is only for searching. (searcher) every minute is coming an update. - updater runs without problems - after commit of updater all changes are available in the updater-instance - NOW is commin my searcher and start an commit=true on each of his core to refresh the changes. NOW i get SOMETIMES my Exception =( Anybody a idea ? here is a part of my solrconfig.xml (updater AND searcher) - indexConfig indexConfig useCompoundFiletrue/useCompoundFile ramBufferSizeMB128/ramBufferSizeMB mergeFactor2/mergeFactor lockTypesingle/lockType writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout unlockOnStartupfalse/unlockOnStartup reopenReaderstrue/reopenReaders infoStream file=INFOSTREAM.txtfalse/infoStream deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep1/str str name=maxOptimizedCommitsToKeep0/str /deletionPolicy /indexConfig updateHandler class=solr.DirectUpdateHandler2 / /indexConfig -- View this message in context: http://lucene.472066.n3.nabble.com/FileNotFoundException-during-commit-concurrences-process-tp3991384.html Sent from the Solr - User mailing list archive at Nabble.com. Regards, Karthik
Strange behaviour with default request handler
Hi, I have a strange behaviour with the default request handler. In the index i have : doc str name=date2012-06-28T10:22:51Z/str str name=description/ str name=firstNameSophie/str str name=iduser-6/str str name=lastNameMichel/str str name=screenNameSophie/str str name=slugsophie/str /doc doc str name=date2012-06-28T10:22:51Z/str str name=description/ str name=firstNameSophia/str str name=iduser-7/str str name=lastNameMartinez/str str name=screenNameSophia/str str name=slugsophia/str /doc And when i search for soph, i only get Sophie in the results and not Sophia. When i search for *:*, i get everything. Why is that ? Did i miss a basic configuration option ? My schema looks like : fields field name=id type=string indexed=true stored=true required=true / field name=who type=text indexed=true stored=false multiValued=true/ field name=screenName type=string indexed=false stored=true required=true / field name=slug type=string indexed=false stored=true required=true / field name=firstName type=string indexed=false stored=true / field name=lastName type=string indexed=false stored=true / field name=description type=string indexed=false stored=true / field name=date type=string indexed=false stored=true required=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldwho/defaultSearchField solrQueryParser defaultOperator=OR/ copyField source=screenName dest=who/ copyField source=firstName dest=who/ copyField source=lastName dest=who/ Any advice ? Thanks ! ;-) Cya, benjamin. -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-behaviour-with-default-request-handler-tp3991976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do we use HTMLStripCharFilterFactory
Hi specify transformer=HTMLStripTransformer at entity level and for the field you want to strip html just set stripHTML=true It should work.. Kiran On Thu, Jun 28, 2012 at 4:09 PM, derohit mailrohi...@gmail.com wrote: Hi All, I am new to SOLR. Please hellp me with configuration of HTMLStripCharFilterFactory. If some tutorial is there, will be of great help. Regards Rohit -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-we-use-HTMLStripCharFilterFactory-tp3991955.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks Regards, Kiran Kumar
core isolation
Hi, In Solr 3.x the parameter abortOnConfigurationError=false allows cores continue to work even if an other core fails due to a configuration error. This parameter doesn't exist anymore in Solr 4.0 but afetr some tests, it looks like cores are isolated from each other. By isolated, I mean if a core fails due to a configuration error or an error like ClassNotFoundException, the other cores continue to work. In an other hand I think there are some errors that will make all cores hang : * OutOfMemoryError * OutOfMemoryError : PermGen space * Too many open files * ... I am using Tomcat 6, can somebody confirm this isolation in Solr 4.0 ? Which errors do not impact other cores and which errors impact other cores ? Regards Dominique
Strange spikes in query response times...any ideas where else to look?
Greetings all, We are working on building up a large Solr index for over 300 million records...and this is our first look at Solr. We are currently running a set of unique search queries against a single server (so no replication, no indexing going on at the same time, and no distributed search) with a set number of records (in our case, 10 million records in the index) for about 30 minutes, with nearly all of our searches being unique (I say nearly because our set of queries is unique, but I have not yet confirmed that JMeter is selecting these queries with no replacement). We are striving for a 2 second response time on the average, and indeed we are pretty darned close. In fact, if you look at the average responses time, we are well under the 2 seconds per query. Unfortunately, we are seeing that about once every 6 minutes or so (and it is not a regular event...exactly six minutes apart...it is about six minutes but it fluctuates) we get a single query that returns in something like 15 to 20 seconds We have been trying to identify what is causing this spike every so often and we are completely baffled. What we have done thus far: 1) Looked through the SAR logs and have not seen anything that correlates to this issue 2) Tracked the JVM statistics...especially the garbage collections...no correlations there either 3) Examined the queries...no pattern obvious there 4) Played with the JVM memory settings (heap settings, cache settings, and any other settings we could find) 5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a fresh install of Redhat 5.7 enterprise, tried on a large instance of AWS EC2, tried on a fresh instance of a VMWare based virtual machine from our own data center) an still nothing is giving us a clue as to what is causing these spikes 5) No correlation found between the number of hits returned and the spikes Our data is very simple and so are the queries. The schema consists of 40 fields, most of which are string fields, 2 of which are location fields, and a small handful of which are integer fields. All fields are indexed and all fields are stored. Our queries are also rather simple. Many of the queries are a simple one-field search. The most complex query we have is a 3-field search. Again, no correlation has been established between the query and these spikes. Also, about 60% of our queries return zero hits (on the assumption that we want to make solr search its entire index every so often. 60% is more than we intended and we will fix that soon...but that is what is currently happening. Again, no correlation found between spikes and 0-hit returned queries). For some time we were testing with 100 million records in the index and the aggregate data looked quite good. Most queries were returning in under 2 seconds. Unfortunately, it was when we looked at the individual data points that we found spikes every 6-8 minutes or so hitting sometimes as high as 150 seconds! We have been testing with 100 million records in the index, 50 million records in the index, 25 million, 20 million, 15 million, and 10 million records. As I indicated at the start, we are now at 10 million records with 15-20 seconds spikes. As we have decreased the number of records in the index,the size (but not the frequency) of the spikes has been dropping. My question is: Is this type of behavior normal for Solr when it is being overstressed? I've read of lots of people with far more complicated schemas running MORE than 10 million records in an index and never once complained about these spikes. Since I am new at this, I am not sure what Solr's failure mode looks like when it has too many records to search. I am hoping someone looking at this note can at least give me another direction to look. 10 million records searched in less than 2 seconds most of the time is great...but those 10 and 20 seconds spikes are not going to go over well with our customers...and I somehow think there is more we should be able to do here. Thanks. Peter S. Lee ProQuest
RE: Strange spikes in query response times...any ideas where else to look?
A few questions... 1) Do you only see these spikes when running JMeter? I.e., do you ever see a spike when you manually run a query? 2) How are you measuring the response time? In my experience there are three different ways to measure query speed. Usually all of them will be approximately equal, but in some situations they can be quite different, and this difference can be a clue as to where the bottleneck is: 1) The response time as seen by the end user (in this case, JMeter) 2) The response time as seen by the container (for example, in Jetty you can get this by enabling logLatency in jetty.xml) 3) The QTime as returned in the Solr response 3) Are you running multiple queries concurrently, or are you just using a single thread in JMeter? -Michael -Original Message- From: s...@isshomefront.com [mailto:s...@isshomefront.com] Sent: Thursday, June 28, 2012 7:56 PM To: solr-user@lucene.apache.org Subject: Strange spikes in query response times...any ideas where else to look? Greetings all, We are working on building up a large Solr index for over 300 million records...and this is our first look at Solr. We are currently running a set of unique search queries against a single server (so no replication, no indexing going on at the same time, and no distributed search) with a set number of records (in our case, 10 million records in the index) for about 30 minutes, with nearly all of our searches being unique (I say nearly because our set of queries is unique, but I have not yet confirmed that JMeter is selecting these queries with no replacement). We are striving for a 2 second response time on the average, and indeed we are pretty darned close. In fact, if you look at the average responses time, we are well under the 2 seconds per query. Unfortunately, we are seeing that about once every 6 minutes or so (and it is not a regular event...exactly six minutes apart...it is about six minutes but it fluctuates) we get a single query that returns in something like 15 to 20 seconds We have been trying to identify what is causing this spike every so often and we are completely baffled. What we have done thus far: 1) Looked through the SAR logs and have not seen anything that correlates to this issue 2) Tracked the JVM statistics...especially the garbage collections...no correlations there either 3) Examined the queries...no pattern obvious there 4) Played with the JVM memory settings (heap settings, cache settings, and any other settings we could find) 5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a fresh install of Redhat 5.7 enterprise, tried on a large instance of AWS EC2, tried on a fresh instance of a VMWare based virtual machine from our own data center) an still nothing is giving us a clue as to what is causing these spikes 5) No correlation found between the number of hits returned and the spikes Our data is very simple and so are the queries. The schema consists of 40 fields, most of which are string fields, 2 of which are location fields, and a small handful of which are integer fields. All fields are indexed and all fields are stored. Our queries are also rather simple. Many of the queries are a simple one-field search. The most complex query we have is a 3-field search. Again, no correlation has been established between the query and these spikes. Also, about 60% of our queries return zero hits (on the assumption that we want to make solr search its entire index every so often. 60% is more than we intended and we will fix that soon...but that is what is currently happening. Again, no correlation found between spikes and 0-hit returned queries). For some time we were testing with 100 million records in the index and the aggregate data looked quite good. Most queries were returning in under 2 seconds. Unfortunately, it was when we looked at the individual data points that we found spikes every 6-8 minutes or so hitting sometimes as high as 150 seconds! We have been testing with 100 million records in the index, 50 million records in the index, 25 million, 20 million, 15 million, and 10 million records. As I indicated at the start, we are now at 10 million records with 15-20 seconds spikes. As we have decreased the number of records in the index,the size (but not the frequency) of the spikes has been dropping. My question is: Is this type of behavior normal for Solr when it is being overstressed? I've read of lots of people with far more complicated schemas running MORE than 10 million records in an index and never once complained about these spikes. Since I am new at this, I am not sure what Solr's failure mode looks like when it has too many records to search. I am hoping someone looking at this note can at least give me another direction to look. 10 million records searched in less than 2 seconds most of the time is great...but those 10 and 20 seconds
RE: Strange spikes in query response times...any ideas where else to look?
Michael, Thank you for responding...and for the excellent questions. 1) We have never seen this response time spike with a user-interactive search. However, in the span of about 40 minutes, which included about 82,000 queries, we only saw a handful of near-equally distributed spikes. We have tried sending queries from the admin tool while the test was running, but given those odds, I'm not surprised we've never hit on one of those few spikes we are seeing in the test results. 2) Good point and I should have mentioned this. We are using multiple methods to track these response times. a) Looking at the catalina.out file and plotting the response times recorded there (I think this is logging the QTime as seen by Solr). b) Looking at what JMeter is reporting as response times. In general, these are very close if not identical to what is being seen in the Catalina.out file. I have not run a line-by-line comparison, but putting the query response graphs next to each other shows them to be nearly (or possibly exactly) the same. Nothing looked out of the ordinary. 3) We are using multiple threads. Before your email I was looking at the results, doing some math, and double checking the reports from JMeter. I did notice that our throughput is much higher than we meant for it to be. JMeter is set up to run 15 threads from a single test machine...but I noticed that the JMeter report is showing close to 47 queries per second. We are only targeting TWO to FIVE queries per second. This is up next on our list of things to look at and how to control more effectively. We do have three separate machines set up for JMeter testing and we are investigating to see if perhaps all three of these machines are inadvertently being launched during the test at one time and overwhelming the server. This *might* be one facet of the problem. Agreed on that. Even as we investigate this last item regarding the number of users/threads, I wouldn't mind any other thoughts you or anyone else had to offer. We are checking on this user/threads issue and for the sake of anyone else you finds this discussion useful I'll note what we find. Thanks again. Peter S. Lee ProQuest Quoting Michael Ryan mr...@moreover.com: A few questions... 1) Do you only see these spikes when running JMeter? I.e., do you ever see a spike when you manually run a query? 2) How are you measuring the response time? In my experience there are three different ways to measure query speed. Usually all of them will be approximately equal, but in some situations they can be quite different, and this difference can be a clue as to where the bottleneck is: 1) The response time as seen by the end user (in this case, JMeter) 2) The response time as seen by the container (for example, in Jetty you can get this by enabling logLatency in jetty.xml) 3) The QTime as returned in the Solr response 3) Are you running multiple queries concurrently, or are you just using a single thread in JMeter? -Michael -Original Message- From: s...@isshomefront.com [mailto:s...@isshomefront.com] Sent: Thursday, June 28, 2012 7:56 PM To: solr-user@lucene.apache.org Subject: Strange spikes in query response times...any ideas where else to look? Greetings all, We are working on building up a large Solr index for over 300 million records...and this is our first look at Solr. We are currently running a set of unique search queries against a single server (so no replication, no indexing going on at the same time, and no distributed search) with a set number of records (in our case, 10 million records in the index) for about 30 minutes, with nearly all of our searches being unique (I say nearly because our set of queries is unique, but I have not yet confirmed that JMeter is selecting these queries with no replacement). We are striving for a 2 second response time on the average, and indeed we are pretty darned close. In fact, if you look at the average responses time, we are well under the 2 seconds per query. Unfortunately, we are seeing that about once every 6 minutes or so (and it is not a regular event...exactly six minutes apart...it is about six minutes but it fluctuates) we get a single query that returns in something like 15 to 20 seconds We have been trying to identify what is causing this spike every so often and we are completely baffled. What we have done thus far: 1) Looked through the SAR logs and have not seen anything that correlates to this issue 2) Tracked the JVM statistics...especially the garbage collections...no correlations there either 3) Examined the queries...no pattern obvious there 4) Played with the JVM memory settings (heap settings, cache settings, and any other settings we could find) 5) Changed hardware: Brand new 4 processor, 8 gig RAM server with a fresh install of Redhat 5.7 enterprise, tried on a large instance of AWS EC2, tried on a fresh instance of a VMWare
Re: Strange spikes in query response times...any ideas where else to look?
Peter, These could be JVM, or it could be index reopening and warmup queries, or Grab SPM for Solr - http://sematext.com/spm - in 24-48h we'll release an agent that tracks and graphs errors and timings of each Solr search component, which may reveal interesting stuff. In the mean time, look at the graph with IO as well as graph with caches. That's where I'd first look for signs. Re users/threads question - if I understand correctly, this is the problem: JMeter is set up to run 15 threads from a single test machine...but I noticed that the JMeter report is showing close to 47 queries per second. It sounds like you re equating # of threads to QPS, which isn't right. Imagine you had 10 threads and each query took 0.1 seconds (processed by a single CPU core) and the server had 10 CPU cores. That would mean that your 1 thread could run 10 queries per second utilizing just 1 CPU core. And 10 threads would utilize all 10 CPU cores and would give you 10x higher throughput - 10x10=100 QPS. So if you need to simulate just 2-5 QPS, just lower the number of threads. What that number should be depends on query complexity and hw resources (cores or IO). Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: s...@isshomefront.com s...@isshomefront.com To: solr-user@lucene.apache.org Sent: Thursday, June 28, 2012 9:20 PM Subject: RE: Strange spikes in query response times...any ideas where else to look? Michael, Thank you for responding...and for the excellent questions. 1) We have never seen this response time spike with a user-interactive search. However, in the span of about 40 minutes, which included about 82,000 queries, we only saw a handful of near-equally distributed spikes. We have tried sending queries from the admin tool while the test was running, but given those odds, I'm not surprised we've never hit on one of those few spikes we are seeing in the test results. 2) Good point and I should have mentioned this. We are using multiple methods to track these response times. a) Looking at the catalina.out file and plotting the response times recorded there (I think this is logging the QTime as seen by Solr). b) Looking at what JMeter is reporting as response times. In general, these are very close if not identical to what is being seen in the Catalina.out file. I have not run a line-by-line comparison, but putting the query response graphs next to each other shows them to be nearly (or possibly exactly) the same. Nothing looked out of the ordinary. 3) We are using multiple threads. Before your email I was looking at the results, doing some math, and double checking the reports from JMeter. I did notice that our throughput is much higher than we meant for it to be. JMeter is set up to run 15 threads from a single test machine...but I noticed that the JMeter report is showing close to 47 queries per second. We are only targeting TWO to FIVE queries per second. This is up next on our list of things to look at and how to control more effectively. We do have three separate machines set up for JMeter testing and we are investigating to see if perhaps all three of these machines are inadvertently being launched during the test at one time and overwhelming the server. This *might* be one facet of the problem. Agreed on that. Even as we investigate this last item regarding the number of users/threads, I wouldn't mind any other thoughts you or anyone else had to offer. We are checking on this user/threads issue and for the sake of anyone else you finds this discussion useful I'll note what we find. Thanks again. Peter S. Lee ProQuest Quoting Michael Ryan mr...@moreover.com: A few questions... 1) Do you only see these spikes when running JMeter? I.e., do you ever see a spike when you manually run a query? 2) How are you measuring the response time? In my experience there are three different ways to measure query speed. Usually all of them will be approximately equal, but in some situations they can be quite different, and this difference can be a clue as to where the bottleneck is: 1) The response time as seen by the end user (in this case, JMeter) 2) The response time as seen by the container (for example, in Jetty you can get this by enabling logLatency in jetty.xml) 3) The QTime as returned in the Solr response 3) Are you running multiple queries concurrently, or are you just using a single thread in JMeter? -Michael -Original Message- From: s...@isshomefront.com [mailto:s...@isshomefront.com] Sent: Thursday, June 28, 2012 7:56 PM To: solr-user@lucene.apache.org Subject: Strange spikes in query response times...any ideas where else to look? Greetings all, We are working on building up a large Solr index for over 300 million records...and this is our first look at Solr. We are currently running a set of unique search
Re: SolrJ Response
hey, one more thing to add when u query to server u need to add the response type u need. have a look at this page. http://lucidworks.lucidimagination.com/display/solr/Response+Writers On Thu, Jun 28, 2012 at 6:14 PM, Jochen Just jochen.j...@avono.de wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I want to display response in json/xmk as it comes from solr. Why don't you use the JSON QueryResponseWriter from Solr directly? http://wiki.apache.org/solr/SolJSON should give you all you need to get started. Jochen - -- Jochen Just Fon: (++49) 711/28 07 57-193 avono AG Mobil: (++49) 172/73 85 387 Breite Straße 2 Mail: jochen.j...@avono.de 70173 Stuttgart WWW: http://www.avono.de -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJP7FG7AAoJEP1xbhgWUHmSJOQP/AyKfHI1bOkmcgKL0PVAnQu9 sCdAUWhL732JWd+TT77onnQQ2g2vKWnIg+00fmU+x5B52uUzU2nKRMCLaVhGlSJK NSb4c5DJVcdzz6G5fofkQLZKahLRSi9d3p8A5c5CMSEvkWLAYR3OPrNTn7dUiJNA D1JyQjSbMMwyf41msjLF84oF4C4Nb+0eY2bqiF5rlMBdEzazYl4hlMkVxzu6taiQ Yf38CB+vd91OznpvMTr89XCuTi+l9XmG0d0TKKvKq4r2sDTrQyBM8q3oyTPeNyKy VsmUP+m6kqlPWOlSjJyxw5PQz5IlfwRskTbrMS4ZCBDH7Bam5D0UtZzuY+DJRKCM eW49MLgbA2IPYnvfd78v+VxCj9Xyh49QZd0ea1uXve7ABp7WeRj/1L8CdHvAK6/k 5NFW02/A+PoI3+QTgcYzXaO5N+AG3maAhLELDgZ1fQW/wISRLSBeSRj7QEQRPLJE rpekf7v3S0fBJyk2cn7YITTuqMogwktVYv/OQ6wB7+1O8cXzt6p4BYRneqmPw4Ll 6Vr/ESdGMTOu7VAzWqB9pmCMjfORtqKIFIVcGyIAGFlD5xhH3aepM2bpbCVJnaMM GYnXoLLSB2mrexyccthBQV+sYOvZerjtcvoBY5ZIxcbT+HBag6ReFrLxM3AUWmLI jtGvkh4y1180l2AKgSSJ =QsJw -END PGP SIGNATURE- -- Thanks Regards Sachin Aggarwal 7760502772