RE: FW: NRTCachingDirectory threads stuck
Thank you. Regards, Moshe Recanati SVP Engineering Office + 972-73-2617564 Mobile + 972-52-6194481 Skype : recanati More at: www.kmslh.com | LinkedIn | FB -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Sunday, February 22, 2015 6:16 PM To: solr-user Subject: Re: FW: NRTCachingDirectory threads stuck On Sun, Feb 22, 2015 at 1:54 PM, Moshe Recanati mos...@kmslh.com wrote: Hi Mikhail, Thank you. 1. Regarding jetty threads - How I can reduce them? https://wiki.eclipse.org/Jetty/Howto/High_Load#Thread_Pool note, you'll get 503 or something when pool size is exceeded. 2. Is it related to the fact we're running Solr 4.0 in parallel on this machine? are their index dirs different? Nevertheless, running something at same machine leads to resource contention. What does `top` say? Thank you Regards, Moshe Recanati SVP Engineering Office + 972-73-2617564 Mobile + 972-52-6194481 Skype: recanati More at: www.kmslh.com | LinkedIn | FB -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Sunday, February 22, 2015 11:18 AM To: solr-user Subject: Re: FW: NRTCachingDirectory threads stuck Hello, I checked 20020.tdump. From the update perspective, it's ok, I see the single thread committed and awaits for opening a searcher. There are a few very bad evidences: - there are many threads executing search requests in parallel. let;s say it's roughly hundred of them. This is dead end. Consider to limit number of jetty threads, start from number of cores available; - heap is full, it's no-way for java. Either increase it, or reduce load or make sure that there are no any leak; - i see many threads executing Luke handler code, it might be really wrong setup, or regular approach for Solr replication. I'm not sure here. On Sun, Feb 22, 2015 at 9:57 AM, Moshe Recanati mos...@kmslh.com wrote: Hi, I saw message rejected because of attachment. I uploaded data to drive https://drive.google.com/file/d/0B0GR0M-lL5QHVDNjZlUwVTR2QTQ/view?us p= sharing Moshe *From:* Moshe Recanati [mailto:mos...@kmslh.com] *Sent:* Sunday, February 22, 2015 8:37 AM *To:* solr-user@lucene.apache.org *Subject:* RE: NRTCachingDirectory threads stuck *From:* Moshe Recanati *Sent:* Sunday, February 22, 2015 8:34 AM *To:* solr-user@lucene.apache.org *Subject:* NRTCachingDirectory threads stuck Hi, We're running two Solr servers on same machine. Once Solr 4.0 and the second is Solr 4.7.1. In the Solr 4.7.1 we've very strange behavior, while indexing document we get spike of memory from 1GB to 4Gb in couple of minutes and huge number of threads stuck on NRTCachingDirectory.openInput methods. Thread sump and GC attached. Are you familiar with this behavior? What can be the trigger for this? Thank you, *Regards,* *Moshe Recanati* *SVP Engineering* Office + 972-73-2617564 Mobile + 972-52-6194481 Skype: recanati [image: KMS2] http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121 00 0184.html More at: www.kmslh.com | LinkedIn http://www.linkedin.com/company/kms-lighthouse | FB https://www.facebook.com/pages/KMS-lighthouse/123774257810917 -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
highlighting the boolean query
Hello! In solr 4.3.1 there seem to be some inconsistency with the highlighting of the boolean query: a OR (b c) OR d This returns a proper hit, which shows that only d was included into the document score calculation. But the highlighter returns both d and c in em tags. Is this a known issue of the standard highlighter? Can it be mitigated? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: Question on CloudSolrServer API
By default the max connections is set to 128 and max connections per host is 32. You can configure an HttpClient as per your needs and pass it as a parameter to CloudSolrServer's constructor. On Mon, Feb 23, 2015 at 3:49 PM, Manohar Sripada manohar...@gmail.com wrote: Thanks for the response. How to control the number of connections pooled here in SolrJ Client? Also, what will be the default values for maximum Connections and all. - Thanks On Thu, Feb 19, 2015 at 6:09 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: No, you should reuse the same CloudSolrServer instance for all requests. It is a thread safe object. You could also create a static/common HttpClient instance and pass it to the constructor of CloudSolrServer but even if you don't, it will create one internally and use it for all requests so that connections can be pooled. On 19-Feb-2015 1:44 pm, Manohar Sripada manohar...@gmail.com wrote: Hi All, I am using CloudSolrServer API of SolrJ library from my application to query Solr. Here, I am creating a new connection to Solr for every search that I am doing. Once I got the results I am closing the connection. Is this the correct way? How does Solr create connections internally? Does it maintain a pool of connections (if so how to configure it)? Thanks, Manohar -- Regards, Shalin Shekhar Mangar.
Re: Question on CloudSolrServer API
Thanks for the response. How to control the number of connections pooled here in SolrJ Client? Also, what will be the default values for maximum Connections and all. - Thanks On Thu, Feb 19, 2015 at 6:09 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: No, you should reuse the same CloudSolrServer instance for all requests. It is a thread safe object. You could also create a static/common HttpClient instance and pass it to the constructor of CloudSolrServer but even if you don't, it will create one internally and use it for all requests so that connections can be pooled. On 19-Feb-2015 1:44 pm, Manohar Sripada manohar...@gmail.com wrote: Hi All, I am using CloudSolrServer API of SolrJ library from my application to query Solr. Here, I am creating a new connection to Solr for every search that I am doing. Once I got the results I am closing the connection. Is this the correct way? How does Solr create connections internally? Does it maintain a pool of connections (if so how to configure it)? Thanks, Manohar
CollationKeyFilterFactory stops suggestions and collations
Hello all, I am working on collations. Somewhere in Solr, I found that UnicodeCollation will do searching fast. But after applying CollationKeyFilterFactory in schema.xml, it stops the suggestions and collations both. Please check the configurations and help me. *Schema.xml:* fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer /fieldType Solrconfig.xml: requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str !-- Solr will use suggestions from both the 'default' spellchecker and from the 'wordbreak' spellchecker and combine them. collations (re-written queries) can include a combination of corrections from both spellcheckers -- str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest10/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations100/str str name=spellcheck.maxCollationTries1000/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str !--strsuggest/str-- !--strquery/str-- /arr /requestHandler
Atomic Update while having fields with attribute stored=true in schema
Hi, I have around 50 fields in my schema and having 20 fields are stored=”true” and rest of them stored=”false” In case partial update (atomic update), it is mentioned at many places that the fields in schema should have stored=”true”. I have also tried atomic update on documents having fields with stored=false and indexed=true, and it didn't work (My whole document vanished from solr or I am unable to search it now, whatever.). Although I didn't change the existing value for the fields having stored=false. Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right? Will it affect the performance of the Solr? if yes, then what is the best practice to reduce performance degradation as much as possible?Thanks in advance. Thanks and Regards, Rahul Bhooteshwar Enterprise Software Engineer HotWax Systems http://www.hotwaxsystems.com - The global leader in innovative enterprise commerce solutions powered by Apache OFBiz. ApacheCon US 2014 Silver Sponsor
Re: Atomic Update while having fields with attribute stored=true in schema
Field with store=true has the downside of disk space. Your index will grow in space requirements. Maybe update the whole document can be an option ... — /Yago Riveiro On Mon, Feb 23, 2015 at 1:02 PM, Rahul Bhooteshwar rahul.bhootesh...@hotwaxsystems.com wrote: Hi Yago Riveiro, Thanks for your quick reply. I am using Solr for faceted search using *Solr**j. *I am using facet queries and filter queries. I am new to Solr so I would like to know what is the best practice to handle such scenarios. Thanks and Regards, Rahul Bhooteshwar Enterprise Software Engineer HotWax Systems http://www.hotwaxsystems.com - The global leader in innovative enterprise commerce solutions powered by Apache OFBiz. ApacheCon US 2014 Silver Sponsor On Mon, Feb 23, 2015 at 5:42 PM, Yago Riveiro yago.rive...@gmail.com wrote: Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right?” Yes, and re-index all your data. Will it affect the performance of the Solr?” What type of queries are you doing now? — /Yago Riveiro On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar rahul.bhootesh...@hotwaxsystems.com wrote: Hi, I have around 50 fields in my schema and having 20 fields are stored=”true” and rest of them stored=”false” In case partial update (atomic update), it is mentioned at many places that the fields in schema should have stored=”true”. I have also tried atomic update on documents having fields with stored=false and indexed=true, and it didn't work (My whole document vanished from solr or I am unable to search it now, whatever.). Although I didn't change the existing value for the fields having stored=false. Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right? Will it affect the performance of the Solr? if yes, then what is the best practice to reduce performance degradation as much as possible?Thanks in advance. Thanks and Regards, Rahul Bhooteshwar Enterprise Software Engineer HotWax Systems http://www.hotwaxsystems.com - The global leader in innovative enterprise commerce solutions powered by Apache OFBiz. ApacheCon US 2014 Silver Sponsor
Re: Atomic Update while having fields with attribute stored=true in schema
Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right?” Yes, and re-index all your data. Will it affect the performance of the Solr?” What type of queries are you doing now? — /Yago Riveiro On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar rahul.bhootesh...@hotwaxsystems.com wrote: Hi, I have around 50 fields in my schema and having 20 fields are stored=”true” and rest of them stored=”false” In case partial update (atomic update), it is mentioned at many places that the fields in schema should have stored=”true”. I have also tried atomic update on documents having fields with stored=false and indexed=true, and it didn't work (My whole document vanished from solr or I am unable to search it now, whatever.). Although I didn't change the existing value for the fields having stored=false. Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right? Will it affect the performance of the Solr? if yes, then what is the best practice to reduce performance degradation as much as possible?Thanks in advance. Thanks and Regards, Rahul Bhooteshwar Enterprise Software Engineer HotWax Systems http://www.hotwaxsystems.com - The global leader in innovative enterprise commerce solutions powered by Apache OFBiz. ApacheCon US 2014 Silver Sponsor
Re: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException
I think this means you've got an older version of noggit around. You need version 0.6. Alan Woodward www.flax.co.uk On 23 Feb 2015, at 13:00, Clemens Wyss DEV wrote: Just about to upgrade to Solr5. My UnitTests fail: 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating core [1-de_CH]: null java.lang.ExceptionInInitializerError: null at org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:152) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:92) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) [target/:na] ... at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) [.cp/:na] Caused by: org.noggit.JSONParser$ParseException: Expected string: char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo' at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na] at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na] at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) ~[noggit.jar:na] at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] ... 56 common frames omitted Look like the exception occurs in the ConfigOverlay static block, line 213: editable_prop_map = (Map)new ObjectBuilder(new JSONParser(new StringReader( MAPPING))).getObject(); What is happening?
Re: CollationKeyFilterFactory stops suggestions and collations
Hi all, I have found to use UnicodeCollation. I need *lucene-collation-2.9.1.jar. *I am using solr 4.10.2. I have download lucene-collation-2.9.1.jar where I have to store this or Is it already in-built in solr? If it already in solr then why suggestions and collations are not coming? Any help. Please? On Mon, Feb 23, 2015 at 4:43 PM, Nitin Solanki nitinml...@gmail.com wrote: Hello all, I am working on collations. Somewhere in Solr, I found that UnicodeCollation will do searching fast. But after applying CollationKeyFilterFactory in schema.xml, it stops the suggestions and collations both. Please check the configurations and help me. *Schema.xml:* fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer /fieldType Solrconfig.xml: requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str !-- Solr will use suggestions from both the 'default' spellchecker and from the 'wordbreak' spellchecker and combine them. collations (re-written queries) can include a combination of corrections from both spellcheckers -- str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest10/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations100/str str name=spellcheck.maxCollationTries1000/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str !--strsuggest/str-- !--strquery/str-- /arr /requestHandler
Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException
Just about to upgrade to Solr5. My UnitTests fail: 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating core [1-de_CH]: null java.lang.ExceptionInInitializerError: null at org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:152) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:92) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) [target/:na] ... at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) [.cp/:na] Caused by: org.noggit.JSONParser$ParseException: Expected string: char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo' at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na] at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na] at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) ~[noggit.jar:na] at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] ... 56 common frames omitted Look like the exception occurs in the ConfigOverlay static block, line 213: editable_prop_map = (Map)new ObjectBuilder(new JSONParser(new StringReader( MAPPING))).getObject(); What is happening?
Re: Atomic Update while having fields with attribute stored=true in schema
Hi Yago Riveiro, Thanks for your quick reply. I am using Solr for faceted search using *Solr**j. *I am using facet queries and filter queries. I am new to Solr so I would like to know what is the best practice to handle such scenarios. Thanks and Regards, Rahul Bhooteshwar Enterprise Software Engineer HotWax Systems http://www.hotwaxsystems.com - The global leader in innovative enterprise commerce solutions powered by Apache OFBiz. ApacheCon US 2014 Silver Sponsor On Mon, Feb 23, 2015 at 5:42 PM, Yago Riveiro yago.rive...@gmail.com wrote: Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right?” Yes, and re-index all your data. Will it affect the performance of the Solr?” What type of queries are you doing now? — /Yago Riveiro On Mon, Feb 23, 2015 at 12:05 PM, Rahul Bhooteshwar rahul.bhootesh...@hotwaxsystems.com wrote: Hi, I have around 50 fields in my schema and having 20 fields are stored=”true” and rest of them stored=”false” In case partial update (atomic update), it is mentioned at many places that the fields in schema should have stored=”true”. I have also tried atomic update on documents having fields with stored=false and indexed=true, and it didn't work (My whole document vanished from solr or I am unable to search it now, whatever.). Although I didn't change the existing value for the fields having stored=false. Which means I have to change all my fields to stored=”true” if I want to use atomic update.Right? Will it affect the performance of the Solr? if yes, then what is the best practice to reduce performance degradation as much as possible?Thanks in advance. Thanks and Regards, Rahul Bhooteshwar Enterprise Software Engineer HotWax Systems http://www.hotwaxsystems.com - The global leader in innovative enterprise commerce solutions powered by Apache OFBiz. ApacheCon US 2014 Silver Sponsor
Re: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException
This code is executed every time Solr is initialized and it is unlikely that it is a bug. Are you using an older version of noggit.jar by any chance? On Mon, Feb 23, 2015 at 6:30 PM, Clemens Wyss DEV clemens...@mysign.ch wrote: Just about to upgrade to Solr5. My UnitTests fail: 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating core [1-de_CH]: null java.lang.ExceptionInInitializerError: null at org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:152) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:92) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) [target/:na] ... at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) [.cp/:na] Caused by: org.noggit.JSONParser$ParseException: Expected string: char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo' at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na] at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na] at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) ~[noggit.jar:na] at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] ... 56 common frames omitted Look like the exception occurs in the ConfigOverlay static block, line 213: editable_prop_map = (Map)new ObjectBuilder(new JSONParser(new StringReader( MAPPING))).getObject(); What is happening? -- - Noble Paul
Stop solr query
Hi, Recently there were some scenarios in which queries that user sent to solr got stuck and increased our solr heap. Is there any option to kill or timeout query that wasn't returned from solr by external command? Thank you, Regards, Moshe Recanati SVP Engineering Office + 972-73-2617564 Mobile + 972-52-6194481 Skype: recanati [KMS2]http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html More at: www.kmslh.comhttp://www.kmslh.com/ | LinkedInhttp://www.linkedin.com/company/kms-lighthouse | FBhttps://www.facebook.com/pages/KMS-lighthouse/123774257810917
incorrect Java version reported in solr dashboard
I have upgraded Java version from 1.7 to 1.8 on Linux server. After the upgrade, if I run Java -version I can see that it really changed to the new one. But when I run Solr, it is still reporting the old version in dashboard JVM section. What could be the reason? -- View this message in context: http://lucene.472066.n3.nabble.com/incorrect-Java-version-reported-in-solr-dashboard-tp4188236.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: incorrect Java version reported in solr dashboard
You're probably launching Solr using the older version of Java somehow. You should make sure your PATH and JAVA_HOME variables point at your Java 8 install from the point of view of the script or configuration that launches Solr. Hope that helps. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Mon, Feb 23, 2015 at 9:19 AM, SolrUser1543 osta...@gmail.com wrote: I have upgraded Java version from 1.7 to 1.8 on Linux server. After the upgrade, if I run Java -version I can see that it really changed to the new one. But when I run Solr, it is still reporting the old version in dashboard JVM section. What could be the reason? -- View this message in context: http://lucene.472066.n3.nabble.com/incorrect-Java-version-reported-in-solr-dashboard-tp4188236.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Used CollationKeyFilterFactory, Seems not to be working
Hi Nitin, How can you pass empty value to the language attribute? Is this intentional? What is your intention to use that filter with suggestion functionality? Ahmet On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have integrate CollationKeyFilterFactory in schema.xml and re-index the data again. *filter class=solr.CollationKeyFilterFactory language= strength=primary/* I need to use this becuase I want to build collations fast. Referred link: http://wiki.apache.org/solr/UnicodeCollation But it stops both suggestions and collations. *Why?* I have also test *CollationKeyFilterFactory *into solr admin inside analysis. Inside that, CKF show some chinese language output. *Please any help?*
Re: Stop solr query
On 2/23/2015 7:23 AM, Moshe Recanati wrote: Recently there were some scenarios in which queries that user sent to solr got stuck and increased our solr heap. Is there any option to kill or timeout query that wasn't returned from solr by external command? The best thing you can do is examine all user input and stop such queries before they execute, especially if they are the kind of query that will cause your heap to grow out of control. The timeAllowed parameter can abort a query that takes too long in certain phases of the query. In recent months, Solr has been modified so that timeAllowed will take effect during more query phases. It is not a perfect solution, but it can be better than nothing. http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed Be aware that sometimes legitimate queries will be slow, and using timeAllowed may cause those queries to fail. Thanks, Shawn
[ANNOUNCE] Luke 4.10.3 released
Hello, Luke 4.10.3 has been released. Download it here: https://github.com/DmitryKey/luke/releases/tag/luke-4.10.3 The release has been tested against the solr-4.10.3 based index. Issues fixed in this release: #13 https://github.com/DmitryKey/luke/pull/13 Apache License 2.0 abbreviation changed from ASL 2.0 to ALv2 Thanks to respective contributors! P.S. waiting for lucene 5.0 artifacts to hit public maven repositories for the next major release of luke. -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Used CollationKeyFilterFactory, Seems not to be working
Hi, I have integrate CollationKeyFilterFactory in schema.xml and re-index the data again. *filter class=solr.CollationKeyFilterFactory language= strength=primary/* I need to use this becuase I want to build collations fast. Referred link: http://wiki.apache.org/solr/UnicodeCollation But it stops both suggestions and collations. *Why?* I have also test *CollationKeyFilterFactory *into solr admin inside analysis. Inside that, CKF show some chinese language output. *Please any help?*
AW: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException
Bingo! thx for the hint -Ursprüngliche Nachricht- Von: Alan Woodward [mailto:a...@flax.co.uk] Gesendet: Montag, 23. Februar 2015 15:00 An: solr-user@lucene.apache.org Betreff: Re: Solr 4.x to Solr 5 = org.noggit.JSONParser$ParseException I think this means you've got an older version of noggit around. You need version 0.6. Alan Woodward www.flax.co.uk On 23 Feb 2015, at 13:00, Clemens Wyss DEV wrote: Just about to upgrade to Solr5. My UnitTests fail: 13:50:41.178 [main] ERROR org.apache.solr.core.CoreContainer - Error creating core [1-de_CH]: null java.lang.ExceptionInInitializerError: null at org.apache.solr.core.SolrConfig.getConfigOverlay(SolrConfig.java:359) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getOverlay(SolrConfig.java:808) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.getSubstituteProperties(SolrConfig.java:798) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:152) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.Config.init(Config.java:92) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.init(SolrConfig.java:180) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:158) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:80) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:61) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:511) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at org.apache.solr.core.CoreContainer.create(CoreContainer.java:488) [solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] at ch.mysign.search.solr.EmbeddedSolrMode.prepareCore(EmbeddedSolrMode.java:51) [target/:na] ... at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192) [.cp/:na] Caused by: org.noggit.JSONParser$ParseException: Expected string: char=u,position=2 BEFORE='{ u' AFTER='pdateHandler : { autoCo' at org.noggit.JSONParser.err(JSONParser.java:223) ~[noggit.jar:na] at org.noggit.JSONParser.nextEvent(JSONParser.java:671) ~[noggit.jar:na] at org.noggit.ObjectBuilder.getObject(ObjectBuilder.java:123) ~[noggit.jar:na] at org.apache.solr.core.ConfigOverlay.clinit(ConfigOverlay.java:213) ~[solr-core.jar:5.0.0 1659987 - anshumgupta - 2015-02-15 12:26:10] ... 56 common frames omitted Look like the exception occurs in the ConfigOverlay static block, line 213: editable_prop_map = (Map)new ObjectBuilder(new JSONParser(new StringReader( MAPPING))).getObject(); What is happening?
Re: Strange search behaviour when upgrading to 4.10.3
Thanks Shawn. Just ran the analysis between 4.6 and 4.10, there seems to be only difference between the outputs positionLength value is set in 4.10. Does that mean anything. Version 4.10 SF text raw_bytes start end positionLength type position message [6d 65 73 73 61 67 65] 0 7 1 ALNUM 1 Version 4.6 SF text raw_bytes type start end position message [6d 65 73 73 61 67 65] ALNUM 0 7 1 Thanks, Rishi. -Original Message- From: Shawn Heisey apa...@elyograg.org To: solr-user solr-user@lucene.apache.org Sent: Fri, Feb 20, 2015 6:51 pm Subject: Re: Strange search behaviour when upgrading to 4.10.3 On 2/20/2015 4:24 PM, Rishi Easwaran wrote: Also, the tokenizer we use is very similar to the following. ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalTokenizer.java ftp://zimbra.imladris.sk/src/HELIX-720.fbsd/ZimbraServer/src/java/com/zimbra/cs/index/analysis/UniversalLexer.jflex From the looks of it the text is being indexed as a single token and not broken across whitespace. I can't claim to know how analyzer code works. I did manage to see the code, but it doesn't mean much to me. I would suggest using the analysis tab in the Solr admin interface. On that page, select the field or fieldType, set the verbose flag and type the actual field contents into the index side of the page. When you click the Analyze Values button, it will show you what Solr does with the input at index time. Do you still have access to any machines (dev or otherwise) running the old version with the custom component? If so, do the same things on the analysis page for that version that you did on the new version, and see whether it does something different. If it does do something different, then you will need to track down the problem in the code for your custom analyzer. Thanks, Shawn
Is Solr best for did you mean functionality just like Google?
Hello, I came in the worst condition. I want to do spell/query correction functionality. I have 49 GB indexed data where I have applied spellchecker. I want to do same as Google - *did you mean*. *Example* - If any user types any question/query which might be misspell or wrong typed. I need to give them suggestion like Did you mean. Is Solr best for it? Warm Regards, Nitin Solanki
Re: Collations are not working fine.
Hi Charles, How you patch the suggester to get frequency information in the spellcheck response? It's very good. I also want to do that? On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed str name=spellcheck.collateMaxCollectDocs50/str. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in the spellcheck response. -Original Message- From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] Sent: Friday, February 13, 2015 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges5/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=dfgram_ci/str str name=spellcheck.dictionarydefault/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count25/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.maxResultsForSuggest1/str str name=spellcheck.alternativeTermCount25/str str name=spellcheck.collatetrue/str str name=spellcheck.maxCollations50/str str name=spellcheck.maxCollationTries50/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr
Re: syntax for increasing java memory
That depends on the JVM you are using. For the Oracle JVMs, use this to get a list of extended options: java -X wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi Guys, I am a newbie on Solr and I am just using it for dovecot sake. Could you help advise the correct syntax to increase java heap size using the -xmx option(or advise some easy-to-read literature for configuring) ? Much appreciate if you could help. I just need this to sort out the problem with my Dovecot FTS. Thanks Kevin
Re: syntax for increasing java memory
Hi Walter Got it. java -Xmx1024m -jar start.jar Thanks Kevin On Tue, Feb 24, 2015 at 1:00 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi Walter, I am running :- Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_65 24.65-b04) I tried running with this command:- java -jar start.jar -Xmx1024m WARNING: System properties and/or JVM args set. Consider using --dry-run or --exec 0[main] INFO org.eclipse.jetty.server.Server ? jetty-8.1.10.v20130312 61 [main] INFO org.eclipse.jetty.deploy.providers.ScanningAppProvider ? Deployment monitor /opt/solr/contexts at interval 0 Still getting 500m. Any advise? Will check java -X out. On Tue, Feb 24, 2015 at 12:49 AM, Walter Underwood wun...@wunderwood.org wrote: That depends on the JVM you are using. For the Oracle JVMs, use this to get a list of extended options: java -X wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi Guys, I am a newbie on Solr and I am just using it for dovecot sake. Could you help advise the correct syntax to increase java heap size using the -xmx option(or advise some easy-to-read literature for configuring) ? Much appreciate if you could help. I just need this to sort out the problem with my Dovecot FTS. Thanks Kevin
Re: Used CollationKeyFilterFactory, Seems not to be working
Hi Nitin, I think that token filter factory has nothing to do with collations in spellchecker domain. Single term from different domains causing confusion. solr.CollationKeyFilterFactory targets mainly for locale sensitive sorting. For example, I used below type to fix sorting problem of Turkish strings. fieldType name=collatedTURKISH class=solr.CollationField language=tr/ Ahmet On Monday, February 23, 2015 6:18 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi Ahmet, language= means that it is used for any language - simply define the language as the empty string for most languages *Intention:* I am working on spell/question correction. Just like google, I want to do same as did you mean. Using spellchecker, I got suggestions and collations both. But collations are not coming as I expected. Reason is that spellcheck.maxCollationTries, If I set the value spellcheck.maxCollationTries=10 then it gives nearby 10 results. Sometimes, expected collation doesn't come inside 10 collations. So, I increased the value to 16000 and results come but it takes around 15 sec. on 49GB indexed data. It is worst case. So, somewhere in Solr, I found *unicodeCollation* and it says that build collations fast. Is it fast? Or Am I doing something wrong in collations? On Mon, Feb 23, 2015 at 9:12 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Nitin, How can you pass empty value to the language attribute? Is this intentional? What is your intention to use that filter with suggestion functionality? Ahmet On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have integrate CollationKeyFilterFactory in schema.xml and re-index the data again. *filter class=solr.CollationKeyFilterFactory language= strength=primary/* I need to use this becuase I want to build collations fast. Referred link: http://wiki.apache.org/solr/UnicodeCollation But it stops both suggestions and collations. *Why?* I have also test *CollationKeyFilterFactory *into solr admin inside analysis. Inside that, CKF show some chinese language output. *Please any help?*
Re: Collations are not working fine.
Hi, we have used spellcheck component the below configs to get a best collation (exact collation) when a query has either single term or multiple terms. As charles, mentioned above we do have a check for getOriginalFrequency() for each term in our service before we send spellcheck response to client, this may not be the case for you, hope this helps request-handler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows100/int str name=dftextSpell/str str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int * str name=spellcheck.alternativeTermCount15/str * * str name=spellcheck.collatetrue/str* * str name=spellcheck.onlyMorePopularfalse/str* * str name=spellcheck.extendedResultstrue/str* * str name =spellcheck.maxCollations100/str* * str name=spellcheck.collateParam.mm http://spellcheck.collateParam.mm100%/str* * str name=spellcheck.collateParam.q.opAND/str* * str name=spellcheck.maxCollationTries1000/str* str name=q.opOR/str . . .. /lst /request-handler . . . searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str !-- str name=classnamesolr.DirectSolrSpellChecker/str -- str name=spellcheckIndexDir./spellchecker/str !-- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str-- str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent *Rajesh**.* On Fri, Feb 20, 2015 at 8:42 AM, Nitin Solanki nitinml...@gmail.com wrote: How to get only the best collations whose hits are more and need to sort them? On Wed, Feb 18, 2015 at 3:53 AM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: Hi Nitin, I was trying many different options for a couple different queries. In fact, I have collations working ok now with the Suggester and WFSTLookup. The problem may have been due to a different dictionary and/or lookup implementation and the specific options I was sending. In general, we're using spellcheck for search suggestions. The Suggester component (vs. Suggester spellcheck implementation), doesn't handle all of our cases. But we can get things working using the spellcheck interface. What gives us particular troubles are the cases where a term may be valid by itself, but also be the start of longer words. The specific terms are acronyms specific to our business. But I'll attempt to show generic examples. E.g. a partial term like fo can expand to fox, fog, etc. and a full term like brown can also expand to something like brownstone. And, yes, the collation brownstone fox is nonsense. But assume, for the sake of argument, it appears in our documents somewhere. For multiple term query with a spelling error (or partially typed term): brown fo We get collations in order of hits, descending like ... brown fox, brown fog, brownstone fox. So far, so good. For a single term query, brown, we get a single suggestion, brownstone and no collations. So, we don't know to keep the term brown! At this point, we need spellcheck.extendedResults=true and look at the origFreq value in the suggested corrections. Unfortunately, the Suggester (spellcheck dictionary) does not populate the original frequency information. And, without this information, the SpellCheckComponent cannot format the extended results. However, with a simple change to Suggester.java, it was easy to get the needed frequency information use it to make a sound decision to keep or drop the input term. But I'd be much obliged if there is a better way to go about it. Configs below. Thanks, Charlie !-- SpellCheck component -- searchComponent class=solr.SpellCheckComponent name=suggestSC lst name=spellchecker str name=namesuggestDictionary/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=fieldtext_all/str float name=threshold0.0001/float str name=exactMatchFirsttrue/str str name=buildOnCommittrue/str /lst /searchComponent !-- Request Handler -- requestHandler name=/tcSuggest class=solr.SearchHandler lst name=defaults str name=titleSearch
syntax for increasing java memory
Hi Guys, I am a newbie on Solr and I am just using it for dovecot sake. Could you help advise the correct syntax to increase java heap size using the -xmx option(or advise some easy-to-read literature for configuring) ? Much appreciate if you could help. I just need this to sort out the problem with my Dovecot FTS. Thanks Kevin
Re: highlighting the boolean query
Erick, nope, we are using std lucene qparser with some customizations, that do not affect the boolean query parsing logic. Should we try some other highlighter? On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson erickerick...@gmail.com wrote: Are you using edismax? On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello! In solr 4.3.1 there seem to be some inconsistency with the highlighting of the boolean query: a OR (b c) OR d This returns a proper hit, which shows that only d was included into the document score calculation. But the highlighter returns both d and c in em tags. Is this a known issue of the standard highlighter? Can it be mitigated? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: highlighting the boolean query
Are you using edismax? On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello! In solr 4.3.1 there seem to be some inconsistency with the highlighting of the boolean query: a OR (b c) OR d This returns a proper hit, which shows that only d was included into the document score calculation. But the highlighter returns both d and c in em tags. Is this a known issue of the standard highlighter? Can it be mitigated? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Re: Used CollationKeyFilterFactory, Seems not to be working
Hi Ahmet, language= means that it is used for any language - simply define the language as the empty string for most languages *Intention:* I am working on spell/question correction. Just like google, I want to do same as did you mean. Using spellchecker, I got suggestions and collations both. But collations are not coming as I expected. Reason is that spellcheck.maxCollationTries, If I set the value spellcheck.maxCollationTries=10 then it gives nearby 10 results. Sometimes, expected collation doesn't come inside 10 collations. So, I increased the value to 16000 and results come but it takes around 15 sec. on 49GB indexed data. It is worst case. So, somewhere in Solr, I found *unicodeCollation* and it says that build collations fast. Is it fast? Or Am I doing something wrong in collations? On Mon, Feb 23, 2015 at 9:12 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Nitin, How can you pass empty value to the language attribute? Is this intentional? What is your intention to use that filter with suggestion functionality? Ahmet On Monday, February 23, 2015 5:03 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have integrate CollationKeyFilterFactory in schema.xml and re-index the data again. *filter class=solr.CollationKeyFilterFactory language= strength=primary/* I need to use this becuase I want to build collations fast. Referred link: http://wiki.apache.org/solr/UnicodeCollation But it stops both suggestions and collations. *Why?* I have also test *CollationKeyFilterFactory *into solr admin inside analysis. Inside that, CKF show some chinese language output. *Please any help?*
Optimize maxSegments=2 not working right with Solr 4.10.2
Hello, We normally run an optimize with maxSegments=2 after our daily indexing. This has worked without problem on Solr 3.6. We recently moved to Solr 4.10.2 and on several shards the optimize completed with no errors in the logs, but left more than 2 segments. We send this xml to Solr optimize maxSegments=2/ I've attached a copy of the indexwriter log for one of the segments where there were 4 segments rather than the requested number (i.e. there should have been only 2 segments) at the end of the optimize.It looks like a merge was done down to two segments and then somehow another process flushed some postings to disk creating two more segments. Then there are messages about 2 of the remaining 4 segments being too big. (See below) What we expected is that the remainng 2 small segments (about 40MB) would get merged with the smaller of the two large segments, i.e. with the 56GB segment, since we gave the argument maxSegments=2. This didn't happen. Any suggestions about how to troubleshoot this issue would be appreciated. Tom --- Excerpt from indexwriter log: TMP][http-8091-Processor5]: findForcedMerges maxSegmentCount=2 ... ... [IW][Lucene Merge Thread #0]: merge time 3842310 msec for 65236 docs ... [TMP][http-8091-Processor5]: findMerges: 4 segments [TMP][http-8091-Processor5]: seg=_1fzb(4.10.2):C1081559/24089:delGen=9 size=672402.066 MB [skip: too large] [TMP][http-8091-Processor5]: seg=_1gj2(4.10.2):C65236/2:delGen=1 size=56179.245 MB [skip: too large] [TMP][http-8091-Processor5]: seg=_1gj0(4.10.2):C16 size=44.280 MB [TMP][http-8091-Processor5]: seg=_1gj1(4.10.2):C8 size=40.442 MB [TMP][http-8091-Processor5]: allowedSegmentCount=3 vs count=4 (eligible count=2) tooBigCount=2 build-1.iw.2015-02-23.txt.gz Description: GNU Zip compressed data
Re: syntax for increasing java memory
Hi Walter, I am running :- Oracle Corporation OpenJDK 64-Bit Server VM (1.7.0_65 24.65-b04) I tried running with this command:- java -jar start.jar -Xmx1024m WARNING: System properties and/or JVM args set. Consider using --dry-run or --exec 0[main] INFO org.eclipse.jetty.server.Server ? jetty-8.1.10.v20130312 61 [main] INFO org.eclipse.jetty.deploy.providers.ScanningAppProvider ? Deployment monitor /opt/solr/contexts at interval 0 Still getting 500m. Any advise? Will check java -X out. On Tue, Feb 24, 2015 at 12:49 AM, Walter Underwood wun...@wunderwood.org wrote: That depends on the JVM you are using. For the Oracle JVMs, use this to get a list of extended options: java -X wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi Guys, I am a newbie on Solr and I am just using it for dovecot sake. Could you help advise the correct syntax to increase java heap size using the -xmx option(or advise some easy-to-read literature for configuring) ? Much appreciate if you could help. I just need this to sort out the problem with my Dovecot FTS. Thanks Kevin
RE: Collations are not working fine.
I filed issue SOLR-7144 with the patch attached. It's probably best to get some feedback from developers. It may not be the right approach, etc. Also, spellcheck.maxCollationTries 0 is the parameter needed to get collation results that respect the current filter queries, etc. Set spellcheck.maxCollations 1 to get multiple collation results. However, if the original query has only a single term, there will be no collation results. Thus, for single term queries, you need to look at the original frequency information to determine if the original term is valid or not. There may be spellcheck suggestions even for terms with origFreq 0. -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Monday, February 23, 2015 11:35 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Charles, How you patch the suggester to get frequency information in the spellcheck response? It's very good. I also want to do that? On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles charles.reit...@tiaa-cref.org wrote: I have been working with collations the last couple days and I kept adding the collation-related parameters until it started working for me. It seems I needed str name=spellcheck.collateMaxCollectDocs50/str. But, I am using the Suggester with the WFSTLookupFactory. Also, I needed to patch the suggester to get frequency information in the spellcheck response. -Original Message- From: Rajesh Hazari [mailto:rajeshhaz...@gmail.com] Sent: Friday, February 13, 2015 3:48 PM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi Nitin, Can u try with the below config, we have these config seems to be working for us. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_general/str lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldtextSpell/str str name=combineWordstrue/str str name=breakWordsfalse/str int name=maxChanges5/int /lst lst name=spellchecker str name=namedefault/str str name=fieldtextSpell/str str name=classnamesolr.IndexBasedSpellChecker/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.75/str float name=thresholdTokenFrequency0.01/float str name=buildOnCommittrue/str str name=spellcheck.maxResultsForSuggest5/str /lst /searchComponent str name=spellchecktrue/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str int name=spellcheck.count5/int str name=spellcheck.alternativeTermCount15/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultstrue/str str name =spellcheck.maxCollations100/str str name=spellcheck.collateParam.mm100%/str str name=spellcheck.collateParam.q.opAND/str str name=spellcheck.maxCollationTries1000/str *Rajesh.* On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James james.d...@ingramcontent.com wrote: Nitin, Can you post the full spellcheck response when you query: q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell James Dyer Ingram Content Group -Original Message- From: Nitin Solanki [mailto:nitinml...@gmail.com] Sent: Friday, February 13, 2015 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Collations are not working fine. Hi James Dyer, I did the same as you told me. Used WordBreakSolrSpellChecker instead of shingles. But still collations are not coming or working. For instance, I tried to get collation of gone with the wind by searching gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*. Also I have documents which contains gone with the wind having 167 times in the documents. I don't know that I am missing something or not. Please check my below solr configuration: *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell *solrconfig.xml:* searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpellCi/str lst name=spellchecker str name=namedefault/str str name=fieldgram_ci/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.5/float int name=maxEdits2/int int name=minPrefix0/int int name=maxInspections5/int int name=minQueryLength2/int float name=maxQueryFrequency0.9/float str name=comparatorClassfreq/str /lst lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldgram/str str
Re: Suggestion on distinct/ group by for a field ?
Maybe pivot facets will do what you need? See: https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Pivot(DecisionTree)Faceting Best, Erick On Mon, Feb 23, 2015 at 11:31 AM, Vishal Swaroop vishal@gmail.com wrote: Please suggest on how to get the distinct count for a field (name). Summary : I have data indexed in the following format category name value Cat1 A 1 Cat1 A 2 Cat1 B 3 Cat1 B 4 I tried getting the distinct name count... but it returns 4 records instaed of 2 (i.e. A, B) http://localhost:8081/solr/core_test/select?q=category:Cat1fl=category,namewt=jsonindent=truefacet.mincount=1facet=true In Oracle I can easily perform the distinct count using groop-by select c.cat, count(*distinct *i.name) from category c, itemname i, value v where v.item_id = i.id and i.cat_id = c.id and c.cat ='Cat1' *group by c.cat http://c.cat* Result: Cat1 2 Thanks
Basic Multilingual search capability
Hi All, For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 https://issues.apache.org/jira/browse/SOLR-6492 Thanks, Rishi.
Re: highlighting the boolean query
Highlighting is such a pain... what does the parsed query look like? If the default operator is OR, then this seems correct as both 'd' and 'c' appear in the doc. So I'm a bit puzzled by your statement that c didn't contribute to the score. If the parsed query is, indeed a +b +c d then it does look like something with the highlighter. Whether other highlighters are better for this case.. no clue ;( Best, Erick On Mon, Feb 23, 2015 at 9:36 AM, Dmitry Kan solrexp...@gmail.com wrote: Erick, nope, we are using std lucene qparser with some customizations, that do not affect the boolean query parsing logic. Should we try some other highlighter? On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson erickerick...@gmail.com wrote: Are you using edismax? On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello! In solr 4.3.1 there seem to be some inconsistency with the highlighting of the boolean query: a OR (b c) OR d This returns a proper hit, which shows that only d was included into the document score calculation. But the highlighter returns both d and c in em tags. Is this a known issue of the standard highlighter? Can it be mitigated? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Suggestion on distinct/ group by for a field ?
Please suggest on how to get the distinct count for a field (name). Summary : I have data indexed in the following format category name value Cat1 A 1 Cat1 A 2 Cat1 B 3 Cat1 B 4 I tried getting the distinct name count... but it returns 4 records instaed of 2 (i.e. A, B) http://localhost:8081/solr/core_test/select?q=category:Cat1fl=category,namewt=jsonindent=truefacet.mincount=1facet=true In Oracle I can easily perform the distinct count using groop-by select c.cat, count(*distinct *i.name) from category c, itemname i, value v where v.item_id = i.id and i.cat_id = c.id and c.cat ='Cat1' *group by c.cat http://c.cat* Result: Cat1 2 Thanks
SolrCloud 4.10.3 Security
Hello, Does anyone know why the Basic authentication was not yet released for SolrCloud as described on the wiki page: https://wiki.apache.org/solr/SolrSecurity? Is there any plan in the near future for closing this issue: https://issues.apache.org/jira/browse/SOLR-4470 ? Isn't already a very basic implementation that can be released? Thanks a lot!Mihaela
more like this and term vectors
Is there a way to configure the more like this query handler and also receive the corresponding term vectors? (tf-idf) ? I tried by creating a “search component” for the term vectors and adding it to the mlt handler, but that did not work. Here is what I tried: searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent”/ requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flfilteredText/str str name=mlt.mintf1/str str name=mlt.mindf1/str str name=mlt.interestingTermslist/str bool name=tvtrue/bool /lst arr name=last-components strtvComponent/str /arr /requestHandler Now I realize that I could turn on the debug parameter but that does not contain the all of the tf/idf (at least not like the tv component provides) Thanks, SCott
Re: more like this and term vectors
It's never helpful when you merely say that it did not work - detail the symptom, please. Post both the query and the response. As well as the field and type definitions for the fields for which you expected term vectors - no term vectors are enabled by default. -- Jack Krupansky On Mon, Feb 23, 2015 at 2:48 PM, Scott C. Cote scottcc...@yahoo.com.invalid wrote: Is there a way to configure the more like this query handler and also receive the corresponding term vectors? (tf-idf) ? I tried by creating a “search component” for the term vectors and adding it to the mlt handler, but that did not work. Here is what I tried: searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent”/ requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.flfilteredText/str str name=mlt.mintf1/str str name=mlt.mindf1/str str name=mlt.interestingTermslist/str bool name=tvtrue/bool /lst arr name=last-components strtvComponent/str /arr /requestHandler Now I realize that I could turn on the debug parameter but that does not contain the all of the tf/idf (at least not like the tv component provides) Thanks, SCott
Re: Basic Multilingual search capability
Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 https://issues.apache.org/jira/browse/SOLR-6492 Thanks, Rishi.
Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'
Hi, I am using Collation Key Filter. After adding it into schema.xml. *Schema.xml* field name=gram type=textSpell indexed=true stored=true required=true multiValued=false/ /fieldTypefieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.CollationKeyFilterFactory language= strength=primary/ /analyzer /fieldType * It throws errror...* Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Plugin init failure for [schema.xml] fieldType textSpell: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'. Schema file is /configs/myconf/schema.xml,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Could not load conf for core collection1: Plugin init failure for [schema.xml] fieldType textSpell: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.collation.CollationKeyFilterFactory'. Schema file is /configs/myconf/schema.xml at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:745) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)
Geo Aggregations and Search Alerts in Solr
Hi There, I am in the process of choosing a search technology for one of my projects and I was looking into Solr and Elasticsearch. Two features that I am more interested are geo aggregations (for map clustering) and search alerts. Elasticsearch seem to have these two features built-in. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/geo-aggs.html http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html I couldn't find relevant documentation for Solr and therefore not sure whether these features are readily available in Solr. Can you please let me know whether these features are available in Solr? If not, whether there are solutions to achieve same with Solr. Thank you.
Query: no result returned if use AND OR operators
Hi, My Solr is 4.10.2 When I use the web UI to run a simple query: 1+AND+2 1) from the log, I can see the hits=8 7629109 [qtp1702388274-16] INFO org.apache.solr.core.SolrCore – [infocast] webapp=/solr path=/clustering params={q=1+AND+2wt=velocityv.template=cluster_results} hits=8 status=0 QTime=21 However, from the query page, it returns 2) 0 results found in 5 ms Page 0 of 0 0 results found. Page 0 of 0 3) If I use Admin page to ruyn the query, I can get 3 back { responseHeader: { status: 0, QTime: 5, params: { indent: true, q: \1\ AND \2\, _: 1424761089223, wt: json } }, response: { numFound: 3, start: 0, docs: [ { title: [ …. Very strange to me, please help! Regards
Re: Basic Multilingual search capability
It isn’t just complicated, it can be impossible. Do you have content in Chinese or Japanese? Those languages (and some others) do not separate words with spaces. You cannot even do word search without a language-specific, dictionary-based parser. German is space separated, except many noun compounds are not space-separated. Do you have Finnish content? Entire prepositional phrases turn into word endings. Do you have Arabic content? That is even harder. If all your content is in space-separated languages that are not heavily inflected, you can kind of do OK with a language-insensitive approach. But it hits the wall pretty fast. One thing that does work pretty well is trademarked names (LaserJet, Coke, etc). Those are spelled the same in all languages and usually not inflected. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:00 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi Alex, There is no specific language list. For example: the documents that needs to be indexed are emails or any messages for a global customer base. The messages back and forth could be in any language or mix of languages. I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results. Now it would be great if it had capability to tokenize email addresses (ex:he...@aol.com- i think standardTokenizer already does this), filenames (здравствуйте.pdf), but maybe we can use filters to accomplish that. Thanks, Rishi. -Original Message- From: Alexandre Rafalovitch arafa...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Feb 23, 2015 5:49 pm Subject: Re: Basic Multilingual search capability Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 https://issues.apache.org/jira/browse/SOLR-6492 Thanks, Rishi.
Re: Special character and wildcard matching
Is it really a string field - as opposed to a text field? Show us the field and field type. Besides, if it really were a raw name, wouldn't that be a capital B? -- Jack Krupansky On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan arunrangara...@gmail.com wrote: I have a string field raw_name like this in my document: {raw_name: beyoncé} (Notice that the last character is a special character.) When I issue this wildcard query: q=raw_name:beyonce* i.e. with the last character simply being the ASCII 'e', Solr returns me the above document. How do I prevent this?
Re: Special character and wildcard matching
But how is that lowercasing occurring? I mean, solr.StrField doesn't do that. Some containers default to automatically mapping accented characters, so that the accented e would then get indexed as a normal e, and then your wildcard would match it, and an accented e in a query would get mapped as well and then match the normal e in the index. What does your query response look like? This blog post explains that problem: http://bensch.be/tomcat-solr-and-special-characters Note that you could make your string field a text field with the keyword tokenizer and then filter it for lower case, such as when the user query might have a capital B. String field is most appropriate when the field really is 100% raw. -- Jack Krupansky On Mon, Feb 23, 2015 at 7:37 PM, Arun Rangarajan arunrangara...@gmail.com wrote: Yes, it is a string field and not a text field. fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ field name=raw_name type=string indexed=true stored=true / Lower-casing done to do case-insensitive matching. On Mon, Feb 23, 2015 at 4:01 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Is it really a string field - as opposed to a text field? Show us the field and field type. Besides, if it really were a raw name, wouldn't that be a capital B? -- Jack Krupansky On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan arunrangara...@gmail.com wrote: I have a string field raw_name like this in my document: {raw_name: beyoncé} (Notice that the last character is a special character.) When I issue this wildcard query: q=raw_name:beyonce* i.e. with the last character simply being the ASCII 'e', Solr returns me the above document. How do I prevent this?
Re: Basic Multilingual search capability
Hi Alex, There is no specific language list. For example: the documents that needs to be indexed are emails or any messages for a global customer base. The messages back and forth could be in any language or mix of languages. I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results. Now it would be great if it had capability to tokenize email addresses (ex:he...@aol.com- i think standardTokenizer already does this), filenames (здравствуйте.pdf), but maybe we can use filters to accomplish that. Thanks, Rishi. -Original Message- From: Alexandre Rafalovitch arafa...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Feb 23, 2015 5:49 pm Subject: Re: Basic Multilingual search capability Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 https://issues.apache.org/jira/browse/SOLR-6492 Thanks, Rishi.
Special character and wildcard matching
I have a string field raw_name like this in my document: {raw_name: beyoncé} (Notice that the last character is a special character.) When I issue this wildcard query: q=raw_name:beyonce* i.e. with the last character simply being the ASCII 'e', Solr returns me the above document. How do I prevent this?
Re: Special character and wildcard matching
Yes, it is a string field and not a text field. fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ field name=raw_name type=string indexed=true stored=true / Lower-casing done to do case-insensitive matching. On Mon, Feb 23, 2015 at 4:01 PM, Jack Krupansky jack.krupan...@gmail.com wrote: Is it really a string field - as opposed to a text field? Show us the field and field type. Besides, if it really were a raw name, wouldn't that be a capital B? -- Jack Krupansky On Mon, Feb 23, 2015 at 6:52 PM, Arun Rangarajan arunrangara...@gmail.com wrote: I have a string field raw_name like this in my document: {raw_name: beyoncé} (Notice that the last character is a special character.) When I issue this wildcard query: q=raw_name:beyonce* i.e. with the last character simply being the ASCII 'e', Solr returns me the above document. How do I prevent this?
apache solr - dovecot - some search fields works some dont
Hi, I finally understand how Solr works(somewhat) its a bit complicated as I am new to the whole concept but I understand it as a search engine. I am using Solr with dovecot. and I found out that some seach fields from the inbox work and other dont. For example if I were to search To and From apache solr would process it in its log and give me an output, however if I were to search something in the Body it would stall and no output. I am guessing this is some schema.xml problem. Could you advise? Oh. I already addressed the java heap size problem. I have underlined the syntax that shows it. I am guessing its only the body search that fails, and it might be schema.xml related. *3374412 [qtp1728413448-16] INFO org.apache.solr.core.SolrCore ? [collection1] webapp=/solr path=/select params={sort=uid+ascfl=uid,scoreq=subject:dave+OR+from:dave+OR+to:davefq=%2Bbox:ac553604f7314b54e6233555fc1a+%2Buser:b...@email.net b...@email.netrows=107161} hits=571 status=0 QTime=706 * 3379438 [qtp1728413448-18] INFO org.apache.solr.servlet. SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714397078since=1424711021771wt=json} status=0 QTime=0 3389791 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714407453since=1424711021771wt=json} status=0 QTime=1 3400172 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714417834since=1424711021771wt=json} status=0 QTime=1 3410544 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714428205since=1424711021771wt=json} status=0 QTime=0 3420895 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714438558since=1424711021771wt=json} status=0 QTime=0 3431247 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714448908since=1424711021771wt=json} status=0 QTime=1 3441671 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714459334since=1424711021771wt=json} status=0 QTime=1 3452017 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714469679since=1424711021771wt=json} status=0 QTime=1 3462363 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714480026since=1424711021771wt=json} status=0 QTime=0 3472707 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714490369since=1424711021771wt=json} status=0 QTime=0 3483139 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714500802since=1424711021771wt=json} status=0 QTime=1 3493590 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714511246since=1424711021771wt=json} status=0 QTime=0 3504027 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714521691since=1424711021771wt=json} status=0 QTime=0 3514477 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714532137since=1424711021771wt=json} status=0 QTime=1 3524933 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714542598since=1424711021771wt=json} status=0 QTime=0 3535288 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714552951since=1424711021771wt=json} status=0 QTime=0 3545634 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714563290since=1424711021771wt=json} status=0 QTime=0 3556077 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714573714since=1424711021771wt=json} status=0 QTime=0 3566496 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714584157since=1424711021771wt=json} status=0 QTime=1 3576937 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714594601since=1424711021771wt=json} status=0 QTime=0 3587273 [qtp1728413448-18] INFO org.apache.solr.servlet.SolrDispatchFilter ? [admin] webapp=null path=/admin/info/logging params={_=1424714604939since=1424711021771wt=json} status=0
snapinstaller does not start newSearcher
Hello, I am using latest solr (solr trunk) . I run snapinstaller, and see that it copies snapshot to index folder but changes are not picked up and logs in slave after running snapinstaller are 44302 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 44303 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – No uncommitted changes. Skipping IW.commit. 44304 [qtp1312571113-14] INFO org.apache.solr.core.SolrCore – SolrIndexSearcher has not changed - not re-opening: org.apache.solr.search.SolrIndexSearcher 44305 [qtp1312571113-14] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 44305 [qtp1312571113-14] INFO org.apache.solr.update.processor.LogUpdateProcessor – [product] webapp=/solr path=/update params={} {commit=} 0 57 Restarting solr gives Error creating core [product]: Error opening new searcher org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:873) at org.apache.solr.core.SolrCore.init(SolrCore.java:646) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:491) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677) at org.apache.solr.core.SolrCore.init(SolrCore.java:845) ... 9 more Any idea what causes this issue. Thanks in advance. Alex.
Re: Basic Multilingual search capability
Hi Wunder, Yes we do expect incoming documents to contain Chinese/Japanese/Arabic languages. From what you have mentioned, it looks like we need to auto detect the incoming content language and tokenize/filter after that. But I thought the ICU tokenizer had capability to do that (https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-ICUTokenizer) This tokenizer processes multilingual text and tokenizes it appropriately based on its script attribute. or am I missing something? Thanks, Rishi. -Original Message- From: Walter Underwood wun...@wunderwood.org To: solr-user solr-user@lucene.apache.org Sent: Mon, Feb 23, 2015 11:17 pm Subject: Re: Basic Multilingual search capability It isn’t just complicated, it can be impossible. Do you have content in Chinese or Japanese? Those languages (and some others) do not separate words with spaces. You cannot even do word search without a language-specific, dictionary-based parser. German is space separated, except many noun compounds are not space-separated. Do you have Finnish content? Entire prepositional phrases turn into word endings. Do you have Arabic content? That is even harder. If all your content is in space-separated languages that are not heavily inflected, you can kind of do OK with a language-insensitive approach. But it hits the wall pretty fast. One thing that does work pretty well is trademarked names (LaserJet, Coke, etc). Those are spelled the same in all languages and usually not inflected. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:00 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi Alex, There is no specific language list. For example: the documents that needs to be indexed are emails or any messages for a global customer base. The messages back and forth could be in any language or mix of languages. I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results. Now it would be great if it had capability to tokenize email addresses (ex:he...@aol.com- i think standardTokenizer already does this), filenames (здравствуйте.pdf), but maybe we can use filters to accomplish that. Thanks, Rishi. -Original Message- From: Alexandre Rafalovitch arafa...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Feb 23, 2015 5:49 pm Subject: Re: Basic Multilingual search capability Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 February 2015 at 16:19, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, For our use case we don't really need to do a lot of manipulation of incoming text during index time. At most removal of common stop words, tokenize emails/ filenames etc if possible. We get text documents from our end users, which can be in any language (sometimes combination) and we cannot determine the language of the incoming text. Language detection at index time is not necessary. Which analyzer is recommended to achive basic multilingual search capability for a use case like this. I have read a bunch of posts about using a combination standardtokenizer or ICUtokenizer, lowercasefilter and reverwildcardfilter factory, but looking for ideas, suggestions, best practices. http://lucene.472066.n3.nabble.com/ICUTokenizer-or-StandardTokenizer-or-for-quot-text-all-quot-type-field-that-might-include-non-whitess-td4142727.html#a4144236 http://lucene.472066.n3.nabble.com/How-to-implement-multilingual-word-components-fields-schema-td4157140.html#a4158923 https://issues.apache.org/jira/browse/SOLR-6492 Thanks, Rishi.
Setting Up an External ZooKeeper Ensemble
Hi, I did follow all the steps in [ https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble] but still I am getting this error bWaiting to see Solr listening on port 8983 [-] Still not seeing Solr listening on 8983 after 30 seconds!/b WARN - 2015-02-24 05:50:19.161; org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) WARN - 2015-02-24 05:50:20.262; org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Where am I going wrong? -- ckreddybh. chaitu...@gmail.com
Re: Basic Multilingual search capability
Hi Rishi, I don't generally recommend a language-insensitive approach except for really simple multilingual use cases (for most of the reasons Walter mentioned), but the ICUTokenizer is probably the best bet you're going to have if you really want to go that route and only need exact-match on the tokens that are parsed. It won't work that well for all languages (CJK languages, for example), but it will work fine for many. It is also possible to handle multi-lingual content in a more intelligent (i.e. per-language configuration) way in your search index, of course. There are three primary strategies (i.e. ways that actually work in the real world) to do this: 1) create a separate field for each language and search across all of them at query time 2) create a separate core per language-combination and search across all of them at query time 3) invoke multiple language-specific analyzers within a single field's analyzer and index/query using one or more of those language's analyzers for each document/query. These are listed in ascending order of complexity, and each can be valid based upon your use case. For at least the first and third cases, you can use index-time language detection to map to the appropriate fields/analyzers if you are otherwise unaware of the languages of the content from your application layer. The third option requires custom code (included in the large Multilingual Search chapter of Solr in Action http://solrinaction.com and soon to be contributed back to Solr via SOLR-6492 https://issues.apache.org/jira/browse/SOLR-6492), but it enables you to index an arbitrarily large number of languages into the same field if needed, while preserving language-specific analysis for each language. I presented in detail on the above strategies at Lucene/Solr Revolution last November, so you may consider checking out the presentation and/or slides to asses if one of these strategies will work for your use case: http://www.treygrainger.com/posts/presentations/semantic-multilingual-strategies-in-lucenesolr/ For the record, I'd highly recommend going with the first strategy (a separate field per language) if you can, as it is certainly the simplest of the approaches (albeit the one that scales the least well after you add more than a few languages to your queries). If you want to stay simple and stick with the ICUTokenizer then it will work to a point, but some of the problems Walter mentioned may eventually bite you if you are supporting certain groups of languages. All the best, Trey Grainger Co-author, Solr in Action Director of Engineering, Search Recommendations @ CareerBuilder On Mon, Feb 23, 2015 at 11:14 PM, Walter Underwood wun...@wunderwood.org wrote: It isn’t just complicated, it can be impossible. Do you have content in Chinese or Japanese? Those languages (and some others) do not separate words with spaces. You cannot even do word search without a language-specific, dictionary-based parser. German is space separated, except many noun compounds are not space-separated. Do you have Finnish content? Entire prepositional phrases turn into word endings. Do you have Arabic content? That is even harder. If all your content is in space-separated languages that are not heavily inflected, you can kind of do OK with a language-insensitive approach. But it hits the wall pretty fast. One thing that does work pretty well is trademarked names (LaserJet, Coke, etc). Those are spelled the same in all languages and usually not inflected. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:00 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi Alex, There is no specific language list. For example: the documents that needs to be indexed are emails or any messages for a global customer base. The messages back and forth could be in any language or mix of languages. I understand relevancy, stemming etc becomes extremely complicated with multilingual support, but our first goal is to be able to tokenize and provide basic search capability for any language. Ex: When the document contains hello or здравствуйте, the analyzer creates tokens and provides exact match search results. Now it would be great if it had capability to tokenize email addresses (ex:he...@aol.com- i think standardTokenizer already does this), filenames (здравствуйте.pdf), but maybe we can use filters to accomplish that. Thanks, Rishi. -Original Message- From: Alexandre Rafalovitch arafa...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Feb 23, 2015 5:49 pm Subject: Re: Basic Multilingual search capability Which languages are you expecting to deal with? Multilingual support is a complex issue. Even if you think you don't need much, it is usually a lot more complex than expected, especially around relevancy. Regards, Alex. Sign up for my Solr resources