Solr edismax NOT operator behavior
Hello, I am using Edismax parser and query submitted by application is of the format price:1000 AND ( NOT ( launch_date:[2007-06-07T00:00:00.000Z TO 2009-04-07T23:59:59.999Z] AND product_type:electronic)). Solr while executing gives unexpected result. I am suspecting it is because of the AND ( NOT portion of the query . Please can any one explain me how this structure is handled. I am using solr 3.6 Any help is appreciated .. Thanks Alok -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-edismax-NOT-operator-behavior-tp3997663.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: leaks in solr
in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 26, 2012, at 3:18 PM, roz dev rozde...@gmail.com wrote: Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj Trunk from what date? - Mark - Mark Miller lucidimagination.com -- Lance Norskog goks...@gmail.com
too many instances of org.tartarus.snowball.Among in the heap
Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description -- *1: 467281101869124400 org.tartarus.snowball.Among * 2: 5244210 1840458960 byte[] 3: 526519495969839368 char[] 4: 10008928864769280 int[] 5: 10250527410021080 java.util.LinkedHashMap$Entry 6: 4672811 268474232 org.tartarus.snowball.Among[] *7: 8072312 258313984 java.util.HashMap$Entry* 8: 466514 246319392 org.apache.lucene.util.fst.FST$Arc[] 9: 1828542 237600432 java.util.HashMap$Entry[] 10: 3834312 153372480 java.util.TreeMap$Entry 11: 2684700 128865600 org.apache.lucene.util.fst.Builder$UnCompiledNode 12: 4712425 113098200 org.apache.lucene.util.BytesRef 13: 3484836 111514752 java.lang.String 14: 2636045 105441800 org.apache.lucene.index.FieldInfo 15: 1813561 101559416 java.util.LinkedHashMap 16: 6291619 100665904 java.lang.Integer 17: 2684700 85910400 org.apache.lucene.util.fst.Builder$Arc 18: 956998 84215824 org.apache.lucene.index.TermsHashPerField 19: 2892957 69430968 org.apache.lucene.util.AttributeSource$State 20: 2684700 64432800 org.apache.lucene.util.fst.Builder$Arc[] 21: 685595 60332360org.apache.lucene.util.fst.FST 22: 933451 59210944java.lang.Object[] 23: 957043 53594408org.apache.lucene.util.BytesRefHash 24: 591463 42585336 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader 25: 424801 40780896 org.tartarus.snowball.ext.EnglishStemmer 26: 424801 40780896 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter 27: 1549670 37192080org.apache.lucene.index.Term 28: 849602 33984080 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation 29: 424801 27187264 org.apache.lucene.analysis.core.WhitespaceTokenizer 30: 478499 26795944 org.apache.lucene.index.FreqProxTermsWriterPerField 31: 535521 25705008 org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray 32: 219081 24537072 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter 33: 478499 22967952 org.apache.lucene.index.FieldInvertState 34: 956998 22967952 org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray 35: 478499 22967952 org.apache.lucene.index.TermVectorsConsumerPerField 36: 478499 22967952 org.apache.lucene.index.NormsConsumerPerField 37: 316582 22793904 org.apache.lucene.store.MMapDirectory$MMapIndexInput 38: 906708 21760992 org.apache.lucene.util.AttributeSource$State[] 39: 906708 21760992 org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl 40: 883588 21206112java.util.ArrayList 41: 438192 21033216 org.apache.lucene.store.RAMOutputStream 42: 860601 20654424java.lang.StringBuilder 43: 424801 20390448 org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator 44: 424801 20390448 org.apache.lucene.analysis.core.StopFilter 45: 424801 20390448 org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter 46: 424801 20390448 org.apache.lucene.analysis.snowball.SnowballFilter 47: 839390 20145360 org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode -Saroj
Re: Upgrade solr 1.4.1 to 3.6
Yes, the index. You know any link/documentation about upgrade solr 1.4.1 - 3.6? -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-solr-1-4-1-to-3-6-tp3996952p3997678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Skip first word
Hi Chantal, if I understand correctly, this implies that I have to populate different fields according to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive, it means that this logic has to be implementend by the process that populates the Solr core. Is this assumption correct? That's kind of bad, because I'd like to have this kind of rules in the Solr configuration. Of course, if that's the only way... :) Thank you Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: giovedì 26 luglio 2012 18.32 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types D, the engine should suggest Dolce Gabbana, but if we type G, it should suggest other brands. Only if users type Gab it should suggest Dolce Gabbana. Thanks S Inizio: Ahmet Arslan [iori...@yahoo.com] Inviato: mercoledì 25 luglio 2012 18.10 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word is there a tokenizer and/or a combination of filter to remove the first term from a field? For example: The quick brown fox should be tokenized as: quick brown fox There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this?
R: Skip first word
Could you elaborate it, please? thanks S Inizio: in.abdul [in.ab...@gmail.com] Inviato: giovedì 26 luglio 2012 20.36 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word That's is best option I had also used shingle filter factory . . On Jul 26, 2012 10:03 PM, Chantal Ackermann-2 [via Lucene] ml-node+s472066n399748...@n3.nabble.com wrote: Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types D, the engine should suggest Dolce Gabbana, but if we type G, it should suggest other brands. Only if users type Gab it should suggest Dolce Gabbana. Thanks S Inizio: Ahmet Arslan [[hidden email]http://user/SendEmail.jtp?type=nodenode=3997480i=0] Inviato: mercoledì 25 luglio 2012 18.10 Fine: [hidden email]http://user/SendEmail.jtp?type=nodenode=3997480i=1 Oggetto: Re: Skip first word is there a tokenizer and/or a combination of filter to remove the first term from a field? For example: The quick brown fox should be tokenized as: quick brown fox There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Skip-first-word-tp3997277p3997480.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Skip-first-word-tp3997277p3997509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Skip first word
Hi Simone, no I meant that you populate the two fields with the same input - best done via copyField directive. The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of size 3 and longer (you might want to set a decent maxsize there). The query for the autocomplete list uses the first field when the input (typed in by the user) is one or two characters long. Your example was: D, G, or than Do or Ga. The result would search only on the single token field that contains for the input Dolce Gabbana only the ngrams D and Do. So, only the input D or Do would result in a hit on Dolce Gabbana. Once the user has typed in the third letter: Dol or Gab, you query the second, more tokenized field which would contain for Dolce Gabbana the ngrams Dol Dolc Dolce Gab Gabb Gabba etc. Both inputs Gab and Dol would then return Dolce Gabbana. 1. First field type: tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=2 side=front/ 2. Secong field type: tokenizer class=solr.WhitespaceTokenizerFactory/ !-- maybe add WordDelimiter etc. -- filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=10 side=front/ 3. field declarations: field name=short_prefix type=short_ngram … / field name=long_prefix type=long_ngram … / copyField source=short_prefix dest=long_prefix / Chantal Am 27.07.2012 um 11:05 schrieb Finotti Simone: Hi Chantal, if I understand correctly, this implies that I have to populate different fields according to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive, it means that this logic has to be implementend by the process that populates the Solr core. Is this assumption correct? That's kind of bad, because I'd like to have this kind of rules in the Solr configuration. Of course, if that's the only way... :) Thank you Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: giovedì 26 luglio 2012 18.32 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types D, the engine should suggest Dolce Gabbana, but if we type G, it should suggest other brands. Only if users type Gab it should suggest Dolce Gabbana. Thanks S Inizio: Ahmet Arslan [iori...@yahoo.com] Inviato: mercoledì 25 luglio 2012 18.10 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word is there a tokenizer and/or a combination of filter to remove the first term from a field? For example: The quick brown fox should be tokenized as: quick brown fox There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this?
dynamic EdgeNGramFilter
hi is there a possibility to configure the minGramSize (EdgeNGramFilter) dynamically while searching a term. all my content is indexed with minGramSize=3 and that is ok but when I want to search a term like *communic*... solr should not return results like *com*puter, *com*mander, *com*a, ... I know I can avoid this when I use quotes like communic but isn't there a better way? It would be nice when I could tell solr (for instance with a query parameter) which amout of characters must be idendical with the search term -- dynamic minGramSize. I hope someone can help me. -- Mit freundlichen Grüßen Alexander Helhorn BA-Student/IT-Service Kommunale Immobilien Jena Paradiesstr. 6 07743 Jena Tel.: 0 36 41 49- 55 11 Fax: 0 36 41 49- 11 55 11 E-Mail: alexander.helh...@jena.de Internet: www.kij.de __ Information from ESET Mail Security, version of virus signature database 7333 (20120727) __ The message was checked by ESET Mail Security. http://www.eset.com
Re: Skip first word
Brilliant! Thank you very much :) Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: venerdì 27 luglio 2012 11.20 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word Hi Simone, no I meant that you populate the two fields with the same input - best done via copyField directive. The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of size 3 and longer (you might want to set a decent maxsize there). The query for the autocomplete list uses the first field when the input (typed in by the user) is one or two characters long. Your example was: D, G, or than Do or Ga. The result would search only on the single token field that contains for the input Dolce Gabbana only the ngrams D and Do. So, only the input D or Do would result in a hit on Dolce Gabbana. Once the user has typed in the third letter: Dol or Gab, you query the second, more tokenized field which would contain for Dolce Gabbana the ngrams Dol Dolc Dolce Gab Gabb Gabba etc. Both inputs Gab and Dol would then return Dolce Gabbana. 1. First field type: tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=2 side=front/ 2. Secong field type: tokenizer class=solr.WhitespaceTokenizerFactory/ !-- maybe add WordDelimiter etc. -- filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=10 side=front/ 3. field declarations: field name=short_prefix type=short_ngram … / field name=long_prefix type=long_ngram … / copyField source=short_prefix dest=long_prefix / Chantal Am 27.07.2012 um 11:05 schrieb Finotti Simone: Hi Chantal, if I understand correctly, this implies that I have to populate different fields according to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive, it means that this logic has to be implementend by the process that populates the Solr core. Is this assumption correct? That's kind of bad, because I'd like to have this kind of rules in the Solr configuration. Of course, if that's the only way... :) Thank you Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: giovedì 26 luglio 2012 18.32 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word Hi, use two fields: 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for inputs of length 3, 2. the other one tokenized as appropriate with minsize=3 and longer for all longer inputs Cheers, Chantal Am 26.07.2012 um 09:05 schrieb Finotti Simone: Hi Ahmet, business asked me to apply EdgeNGram with minGramSize=1 on the first term and with minGramSize=3 on the latter terms. We are developing a search suggestion mechanism, the idea is that if the user types D, the engine should suggest Dolce Gabbana, but if we type G, it should suggest other brands. Only if users type Gab it should suggest Dolce Gabbana. Thanks S Inizio: Ahmet Arslan [iori...@yahoo.com] Inviato: mercoledì 25 luglio 2012 18.10 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word is there a tokenizer and/or a combination of filter to remove the first term from a field? For example: The quick brown fox should be tokenized as: quick brown fox There is no such filter that i know of. Though, you can implement one with modifying source code of LengthFilterFactory or StopFilterFactory. They both remove tokens. Out of curiosity, what is the use case for this?
Solr - customize Fragment using hl.fragmenter and hl.regex.pattern
0 down vote favorite I want solr highlight in specific format. Below is string format for which i need to provide highlighting feature --- 130s: LISTEN! LISTEN! 138s: [THUMP] 143s: WHAT IS THAT? 144s: HEAR THAT? 152s: EVERYBODY, SHH. SHH. 156s: STAY UP THERE. 163s: [BOAT CREAKING] 165s: WHAT IS THAT? 167s: [SCREAMING] 191s: COME ON! 192s: OH, GOD! 193s: AAH! 249s: OK. WE'VE HAD SOME PROBLEMS 253s: AT THE FACILITY. 253s: WHAT WE'RE ATTEMPTING TO ACHIEVE 256s: HERE HAS NEVER BEEN DONE. 256s: WE'RE THIS CLOSE 259s: TO THE REACTIVATION 259s: OF A HUMAN BRAIN CELL. 260s: DOCTOR, THE 200 MILLION 264s: I'VE SUNK INTO THIS COMPANY 264s: IS DUE IN GREAT PART 266s: TO YOUR RESEARCH. --- after user search I want to provide user fragment in below format *Previous Line of Highlight + Line containing Highlight + Next Line of Highlight* For. E.g. user searched for term hear , then one typical highlight fragment should be like below *str143s: WHAT IS THAT? 144s: emHEAR/em THAT? 152s: EVERYBODY, SHH. SHH./str* above is my ultimate plan , but right now I am trying to get fragment as, which start with ns: where n is numner between 0 to i use hl.regex.slop = 0.6 and my hl.fragsize=120 and below is regex for that. *\b(?=\s*\d{1,4}s:){50,200} * using above regular expression my fragment always do not start with ns: Please suggest me on this , how can i achieve ultimate plan Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-customize-Fragment-using-hl-fragmenter-and-hl-regex-pattern-tp3997693.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Skip first word
Your're welcome :-) C
Re: too many instances of org.tartarus.snowball.Among in the heap
It is something from internally of the snowball analyzer (stemmer). To find out more you should take a heapdump and look into it with Memory Analyzer (MAT) http://www.eclipse.org/mat/ Regards, Bernd Am 27.07.2012 09:53, schrieb roz dev: Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description -- *1: 467281101869124400 org.tartarus.snowball.Among * 2: 5244210 1840458960 byte[] 3: 526519495969839368 char[] 4: 10008928864769280 int[] 5: 10250527410021080 java.util.LinkedHashMap$Entry 6: 4672811 268474232 org.tartarus.snowball.Among[] *7: 8072312 258313984 java.util.HashMap$Entry* 8: 466514 246319392 org.apache.lucene.util.fst.FST$Arc[] 9: 1828542 237600432 java.util.HashMap$Entry[] 10: 3834312 153372480 java.util.TreeMap$Entry 11: 2684700 128865600 org.apache.lucene.util.fst.Builder$UnCompiledNode 12: 4712425 113098200 org.apache.lucene.util.BytesRef 13: 3484836 111514752 java.lang.String 14: 2636045 105441800 org.apache.lucene.index.FieldInfo 15: 1813561 101559416 java.util.LinkedHashMap 16: 6291619 100665904 java.lang.Integer 17: 2684700 85910400 org.apache.lucene.util.fst.Builder$Arc 18: 956998 84215824 org.apache.lucene.index.TermsHashPerField 19: 2892957 69430968 org.apache.lucene.util.AttributeSource$State 20: 2684700 64432800 org.apache.lucene.util.fst.Builder$Arc[] 21: 685595 60332360org.apache.lucene.util.fst.FST 22: 933451 59210944java.lang.Object[] 23: 957043 53594408org.apache.lucene.util.BytesRefHash 24: 591463 42585336 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader 25: 424801 40780896 org.tartarus.snowball.ext.EnglishStemmer 26: 424801 40780896 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter 27: 1549670 37192080org.apache.lucene.index.Term 28: 849602 33984080 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation 29: 424801 27187264 org.apache.lucene.analysis.core.WhitespaceTokenizer 30: 478499 26795944 org.apache.lucene.index.FreqProxTermsWriterPerField 31: 535521 25705008 org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray 32: 219081 24537072 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter 33: 478499 22967952 org.apache.lucene.index.FieldInvertState 34: 956998 22967952 org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray 35: 478499 22967952 org.apache.lucene.index.TermVectorsConsumerPerField 36: 478499 22967952 org.apache.lucene.index.NormsConsumerPerField 37: 316582 22793904 org.apache.lucene.store.MMapDirectory$MMapIndexInput 38: 906708 21760992 org.apache.lucene.util.AttributeSource$State[] 39: 906708 21760992 org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl 40: 883588 21206112java.util.ArrayList 41: 438192 21033216 org.apache.lucene.store.RAMOutputStream 42: 860601 20654424java.lang.StringBuilder 43: 424801 20390448 org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator 44: 424801 20390448 org.apache.lucene.analysis.core.StopFilter 45: 424801 20390448 org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter 46: 424801 20390448 org.apache.lucene.analysis.snowball.SnowballFilter 47: 839390 20145360 org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode -Saroj -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie Universitätsstr. 25 und Wissensmanagement 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: too many instances of org.tartarus.snowball.Among in the heap
Try taking a couple of thread dumps and see where in the stack the snowball classes show up. That might give you a clue. Did you customize the parameters to the stemmer? If so, maybe it has problems with the file you gave it. Just some generic thoughts that might help. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 27, 2012 at 3:53 AM, roz dev rozde...@gmail.com wrote: Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description -- *1: 467281101869124400 org.tartarus.snowball.Among * 2: 5244210 1840458960 byte[]
Re: leaks in solr
I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 26, 2012, at 3:18 PM, roz dev rozde...@gmail.com wrote: Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj Trunk from what date? - Mark - Mark Miller lucidimagination.com -- Lance Norskog goks...@gmail.com
how solr will apply regex fragmenter
I was looking on Regex fragment for customizing my highlight fragment, I was wondering how Regex fragment works within solr and googled for it , But didn't found any results. Can anybody tell me how regex fragmenter works with in solr. And when regex fragmenter apply regex on fragments , do i first get fragment using default solr operation and then apply regex on it. Or it directly apply regex on Search term and then return fragment.. -- View this message in context: http://lucene.472066.n3.nabble.com/how-solr-will-apply-regex-fragmenter-tp3997749.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with Solr 4.0-ALPHA and JSON response
Hi all, I'm new to Solr, I have a problem with JSON format, this is my Java client code: PrintWriter out = res.getWriter(); res.setContentType(text/plain); String query = req.getParameter(query); SolrServer solr = new HttpSolrServer(solrServer); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(qt, /select); params.set(q, contenuto:( + query + )); params.set(hl, true); params.set(hl.fl, id,contenuto,score); params.set(wt, json); QueryResponse response = solr.query(params); log.debug(response.toString()); out.print(response.toString()); out.flush(); Now the problem is that I recieve the response but it doesn't trigger the javascript callback function. I see wt=javabin in SolrCore.execute log, even if I set wt=json in paramters, is this normal? This is the jQuery call to the server: $.getJSON('solrServer.html', {query: escape($('input[name=query]:visible').val())}, function(data){ var view = ''; for (var i=0; idata.response.docs.length; i++) { view += 'p'+data.response.docs[i].contenuto+'/p'; } $('#placeholder').html(view); }); Thanks for reading.
Deduplication in SolrCloud
Hi, in my old Solr Setup I have used the deduplication feature in the update chain with couple of fields. updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldsuuid,type,url,content_hash/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain This worked fine. When I now use this in my 2 shards SolrCloud setup when inserting 150.000 documents, I am always getting an error: *INFO: end_commit_flush* *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log* *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread* * at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) * I am inserting the documents via CSV import and curl command and split them also into 50k chunks. Without the dedupe chain, the import finishes after 40secs. The curl command writes to one of my shards. Do you have an idea why this happens? Should I reduce the fields to one? I have read that not using the id as dedupe fields could be an issue? I have searched for deduplication with SolrCloud and I am wondering if it is already working correctly? see e.g. http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html Thanks regards Daniel
RE: Deduplication in SolrCloud
This issue doesn't really describe your problem but a more general problem of distributed deduplication: https://issues.apache.org/jira/browse/SOLR-3473 -Original message- From:Daniel Brügge daniel.brue...@googlemail.com Sent: Fri 27-Jul-2012 17:38 To: solr-user@lucene.apache.org Subject: Deduplication in SolrCloud Hi, in my old Solr Setup I have used the deduplication feature in the update chain with couple of fields. updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldsuuid,type,url,content_hash/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain This worked fine. When I now use this in my 2 shards SolrCloud setup when inserting 150.000 documents, I am always getting an error: *INFO: end_commit_flush* *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log* *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread* * at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) * I am inserting the documents via CSV import and curl command and split them also into 50k chunks. Without the dedupe chain, the import finishes after 40secs. The curl command writes to one of my shards. Do you have an idea why this happens? Should I reduce the fields to one? I have read that not using the id as dedupe fields could be an issue? I have searched for deduplication with SolrCloud and I am wondering if it is already working correctly? see e.g. http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html Thanks regards Daniel
question(s) re lucene spatial toolkit aka LSP aka spatial4j
hopefully someone is using the lucene spatial toolkit aka LSP aka spatial4j, and can answer this question we are using this spatial tool for doing searches. overall, it seems to work very well. however, finding documentation is difficult. I have a couple of questions: 1. I have a geohash field in my solr schema that contains indexed geographic polygon data. I want to find all docs where that polygon intersects a given lat/long. I was experimenting with returning distance in the resultset and with sorting by distance and found that the following query works. However, I dont know what distance means in the query. i.e. is it distance from point to the polygon centroid, to the closest outer edge of the polygon, its a useless random value, etc. Does anyone know?? http://solrserver:solrport/solr/core0/select?q=*:*fq={!v=$geoq%20cache=false}geoq=wkt_search:%22Intersects(Circle(-97.057%2047.924%20d=0.01))%22sort=query($geoq)+ascfl=catchment_wkt1_trimmed,school_name,latitude,longitude,dist:query($geoq,-1),loc_city,loc_state 2. some of the polygons, being geographic representations, are very big (ie state/province polygons). when solr starts processing a spatial query (like the one above), I can see (INFO: Building Cache [xx]) it fills in some sort of memory cache (org.apache.lucene.spatial.strategy.util.ShapeFieldCache) of the indexed polygon data. We are encountering Java OOM issues when this occurs (even when we booested the mem to 7GB). I know that some of the polygons can have more than 2300 points, but heavy trimming isn't really an option due to level of detail issues. Can we control this caching, or the indexing of the polygons, in any way to reduce the memory requirements?? -- View this message in context: http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Bulk Indexing
Hi, Previously I asked a similar question and I have not fully implemented yet. My plan is: 1) use Solr only for search, not for indexing 2) have a separate java process to index (calling lucene API directly, maybe can call Solr API, I need to check more details). As other people pointed earlier, the problem with above plan is that Solr does not know when to reload IndexSearcher (namely underlying IndexReader) after indexing is done, since indexer and Solr are two separate processes? My plan is to let Solr not to cache any IndexReader (each time when performing search, just create a new IndexSearcher), because: 1) our app is made of many lucene indexed data folders (in Solr language, many cores), caching IndexSearcher would be too expensive. 2) in my experience, without caching search is still quite fast (this is maybe partially due to the fact our indexed data is not large, per folder). This is just my plan (not fully implemented yet). Best regards, Lisheng -Original Message- From: Sohail Aboobaker [mailto:sabooba...@gmail.com] Sent: Friday, July 27, 2012 6:56 AM To: solr-user@lucene.apache.org Subject: Bulk Indexing Hi, We have created a search service which is responsible for providing interface between Solr and rest of our application. It basically takes one document at a time and updates or adds it to appropriate index. Now, in application, we have processes, that add products (our document are based on products) in bulk using a data bulk load process. At this point, we use the same search service to add the documents in a loop. These can be up to 20,000 documents in one load. In a recent solr user discussion, it seems like this is a no-no strategy with red flags all around it. What are other alternatives? Thanks, Regards, Sohail Aboobaker.
Re: Bulk Indexing
Haven't tried this but: 1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can somebody confirm this? 2) If 1) is true, run everything as two cores. 3) One core is live in production 4) Second core is detached from SOLR and attached to something like SolrJ, which I believe can index without going over network 5) Once SolrJ finished bulk import indexing, switch the cores around Or if you are not live, just use SolrJ to run the index and then attached finished core to SOLR. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 27, 2012 at 9:55 AM, Sohail Aboobaker sabooba...@gmail.com wrote: Hi, We have created a search service which is responsible for providing interface between Solr and rest of our application. It basically takes one document at a time and updates or adds it to appropriate index. Now, in application, we have processes, that add products (our document are based on products) in bulk using a data bulk load process. At this point, we use the same search service to add the documents in a loop. These can be up to 20,000 documents in one load. In a recent solr user discussion, it seems like this is a no-no strategy with red flags all around it. What are other alternatives? Thanks, Regards, Sohail Aboobaker.
Solr not getting OpenText document name and metadata
Hi, I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the output is sent to Solr (4.0 alpha). All I see in the index is an id = to the opentext download URL and a version (a big integer value). What I don't see is the document name from OpenText or any of the Opentext metadata. Does anyone know how I can get this data? because I can't even search by document name or by document extension! Only a few of the documents actually have a title in the solr index. but the Opentext name of the document is nowhere to be found. if I know some text within the document I can search for that. I'm using the default schema with tika as the extraction handler I'm also using uprefix = attr to get all of the ignored properties but most of those are useless. Please advise... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-not-getting-OpenText-document-name-and-metadata-tp3997786.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Bulk Indexing
We will be using Solr 3.x version. I was wondering if we do need to worry about this as we have only 10k index entries at a time. It sounds like a very low number and we have only document type at this point. Should we worry about directly using SolrJ for indexing and searching for this low volume simple schema?
Re: Solr edismax NOT operator behavior
can any one explain - add the debugQuery=true option to your request and Solr will give an explanation, including the parsed query and the Lucene scoring of documents. If you think Solr is wrong, show us a sample document that either is supposed to appear that doesn't, or doesn't appear and should. How are the results unexpected? Then do simple queries, each using the id value for the unexplained document and each of the clauses in your expression. -- Jack Krupansky -Original Message- From: Alok Bhandari Sent: Friday, July 27, 2012 1:55 AM To: solr-user@lucene.apache.org Subject: Solr edismax NOT operator behavior Hello, I am using Edismax parser and query submitted by application is of the format price:1000 AND ( NOT ( launch_date:[2007-06-07T00:00:00.000Z TO 2009-04-07T23:59:59.999Z] AND product_type:electronic)). Solr while executing gives unexpected result. I am suspecting it is because of the AND ( NOT portion of the query . Please can any one explain me how this structure is handled. I am using solr 3.6 Any help is appreciated .. Thanks Alok -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-edismax-NOT-operator-behavior-tp3997663.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Bulk Indexing
I assume your're indexing on the same server that is used to execute search queries. Adding 20K documents in bulk could cause the Solr Server to 'stop the world' where the server would stop responding to queries. My suggestion is - Setup master/slave to insulate your clients from 'stop the world' events during indexing. - Update in batches with a commit at the end of the batch. -- View this message in context: http://lucene.472066.n3.nabble.com/Bulk-Indexing-tp3997745p3997815.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: leaks in solr
Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to think of a reason why my extended version wouldnt work.. Does anyone have any idea? On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 26, 2012, at 3:18 PM, roz dev rozde...@gmail.com wrote: Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this
Re: leaks in solr
Just to clarify, the leak happens everytime a new searcher is opened. On Fri, Jul 27, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to think of a reason why my extended version wouldnt work.. Does anyone have any idea? On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote:
Re: leaks in solr
A finally clause can throw exceptions. Can this throw an exception? subQueries.get(i).close(); If so, each close() call should be in a try-catch block. On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to think of a reason why my extended version wouldnt work.. Does anyone have any idea? On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu,
Re: Deduplication in SolrCloud
Should the old Signature code be removed? Given that the goal is to have everyone use SolrCloud, maybe this kind of landmine should be removed? On Fri, Jul 27, 2012 at 8:43 AM, Markus Jelsma markus.jel...@openindex.io wrote: This issue doesn't really describe your problem but a more general problem of distributed deduplication: https://issues.apache.org/jira/browse/SOLR-3473 -Original message- From:Daniel Brügge daniel.brue...@googlemail.com Sent: Fri 27-Jul-2012 17:38 To: solr-user@lucene.apache.org Subject: Deduplication in SolrCloud Hi, in my old Solr Setup I have used the deduplication feature in the update chain with couple of fields. updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupesfalse/bool str name=fieldsuuid,type,url,content_hash/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain This worked fine. When I now use this in my 2 shards SolrCloud setup when inserting 150.000 documents, I am always getting an error: *INFO: end_commit_flush* *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log* *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: unable to create new native thread* * at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) * * at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) * I am inserting the documents via CSV import and curl command and split them also into 50k chunks. Without the dedupe chain, the import finishes after 40secs. The curl command writes to one of my shards. Do you have an idea why this happens? Should I reduce the fields to one? I have read that not using the id as dedupe fields could be an issue? I have searched for deduplication with SolrCloud and I am wondering if it is already working correctly? see e.g. http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html Thanks regards Daniel -- Lance Norskog goks...@gmail.com
Re: querying using filter query and lots of possible values
: the list of IDs is constant for a longer time. I will take a look at : these join thematic. : Maybe another solution would be to really create a whole new : collection or set of documents containing the aggregated documents (from the : ids) from scratch and to execute queries on this collection. Then this : would take : some time, but maybe it's worth it because the querying will thank you. Another avenue to consider... http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/schema/ExternalFileField.html ...would allow you to map values in your source_id to some numeric values (many to many) and these numeric values would then be accessible in functions -- so you could use something like fq={!frange ...} to select all docs with value 67 where your extenral file field says that value 67 is mapped ot the following thousand source_id values. the external field fields can then be modified at any time just by doing a commit on your index. -Hoss
Re: leaks in solr
First no. Because i do the following for(i=0;isubqueries.size();i++) { subQueries.get(i).close(); } Second, I dont see any exception until the first searcher leak happens. On Fri, Jul 27, 2012 at 9:04 PM, Lance Norskog goks...@gmail.com wrote: A finally clause can throw exceptions. Can this throw an exception? subQueries.get(i).close(); If so, each close() call should be in a try-catch block. On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to think of a reason why my extended version wouldnt work.. Does anyone have any idea? On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} --
Re: leaks in solr
SimpleOrderedMapObject commonRequestParams; //This holds the common request params. VectorSimpleOrderedMapObject subQueryRequestParams; // This holds the request params of sub Queries I use the above to create multiple localQueryRequests. To add a little more information, I create new ResponseBuilder for each request I also hold a reference to query component as a private member in my CustomHandler. Considering that the component is initialized only once during the start up, I assume this isnt a cause of concernt. On Fri, Jul 27, 2012 at 9:49 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: First no. Because i do the following for(i=0;isubqueries.size();i++) { subQueries.get(i).close(); } Second, I dont see any exception until the first searcher leak happens. On Fri, Jul 27, 2012 at 9:04 PM, Lance Norskog goks...@gmail.com wrote: A finally clause can throw exceptions. Can this throw an exception? subQueries.get(i).close(); If so, each close() call should be in a try-catch block. On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to think of a reason why my extended version wouldnt work.. Does anyone have any idea? On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held
Re: leaks in solr
subQueries.get(i).close() is nothing but pulling the refrence from the vector and closing it. So yes. it wouldnt throw exception. vectorLocalSolrQueryRequests subQueries Please let me know if you need any more information On Fri, Jul 27, 2012 at 10:14 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: SimpleOrderedMapObject commonRequestParams; //This holds the common request params. VectorSimpleOrderedMapObject subQueryRequestParams; // This holds the request params of sub Queries I use the above to create multiple localQueryRequests. To add a little more information, I create new ResponseBuilder for each request I also hold a reference to query component as a private member in my CustomHandler. Considering that the component is initialized only once during the start up, I assume this isnt a cause of concernt. On Fri, Jul 27, 2012 at 9:49 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: First no. Because i do the following for(i=0;isubqueries.size();i++) { subQueries.get(i).close(); } Second, I dont see any exception until the first searcher leak happens. On Fri, Jul 27, 2012 at 9:04 PM, Lance Norskog goks...@gmail.com wrote: A finally clause can throw exceptions. Can this throw an exception? subQueries.get(i).close(); If so, each close() call should be in a try-catch block. On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to think of a reason why my extended version wouldnt work.. Does anyone have any idea? On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/..@2f2d9d89 indexVersion : 1336499508709 openedAt : Fri Jul 27 09:45:16 EDT 2012 registeredAt : Fri Jul 27 09:45:19 EDT 2012 warmupTime : 0 In my custom handler, I have the following code I have the following problem Although in my custom handler, I have the following implementation(its not the full code but it gives an overall idea of the implementation) and it class CustomHandler extends SearchHandler { void handleRequestBody(SolrQueryRequest req,SolrQueryResponse rsp) SolrCore core= req.getCore(); vectorSimpleOrderedMapObject requestParams = new vectorSimpleOrderedMapObject(); /*parse the params such a way that requestParams[i] -= parameter of the ith request */ .. try { vectorLocalSolrQueryRequests subQueries = new vectorLocalSolrQueryRequests(solrcore, requestParams[i]); for(i=0;isubQueryCount;i++) { ResponseBuilder rb = new ResponseBuilder() rb.req = req; handlerRequestBody(req,rsp,rb); //this would call search handler's handler request body, whose signature, i have modified } } finally { for(i=0; isubQueries.size();i++) subQueries.get(i).close(); } } *Search Handler Changes* class SearchHandler { void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp, ResponseBuilder rb, ArrayListComponent comps) { // ResponseBuilder rb = new ResponseBuilder() ; .. } void handleRequestBody(SolrQueryRequest req, SolrQueryResponse) { ResponseBuilder rb = new ResponseBuilder(req,rsp, new ResponseBuilder()); handleRequestBody(req, rsp, rb, comps) ; } } I don see the index old index searcher geting closed after warming up the new guy... Because I replicate every 5 mintues, it crashes in 2 hours.. On Fri, Jul 27, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other