Re: Spellcheck compounded words
Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from spellcheck.maxCollationTries... Even with str name=spellcheck.maxCollationTries1/str it hangs forewver. The only way to restart is to stop solr, delete data folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
Which version of Solr are you running? (the post you replied to was about Solr 3.3, but the latest version now is 4.4.) Please provide configuration details and the query you are running that causes the problem. Also explain exactly what the problem is (query never returns?). Also explain why you have to delete the data dir when you restart. With a little background information, maybe someone can help. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Rah1x [mailto:raheel_itst...@yahoo.com] Sent: Monday, September 16, 2013 5:47 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from spellcheck.maxCollationTries... Even with str name=spellcheck.maxCollationTries1/str it hangs forewver. The only way to restart is to stop solr, delete data folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
Hi, I m running 4.3.. I have posted all the details in another threat... do you want me to copy it here? or could you see that? The subject is *spellcheck causing Core Reload to hang*. On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James james.d...@ingramcontent.comwrote: Which version of Solr are you running? (the post you replied to was about Solr 3.3, but the latest version now is 4.4.) Please provide configuration details and the query you are running that causes the problem. Also explain exactly what the problem is (query never returns?). Also explain why you have to delete the data dir when you restart. With a little background information, maybe someone can help. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Rah1x [mailto:raheel_itst...@yahoo.com] Sent: Monday, September 16, 2013 5:47 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from spellcheck.maxCollationTries... Even with str name=spellcheck.maxCollationTries1/str it hangs forewver. The only way to restart is to stop solr, delete data folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Raheel Hasan
RE: Spellcheck compounded words
I would investigate Hoss's suggestion and look at warming queries. In some cases I've seen maxCollationTries in warming queries to cause a hang. Unless you're trying to build your spellcheck dictionary during warming, you can safely turn spellcheck off for all warming queries. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Raheel Hasan [mailto:raheelhasan@gmail.com] Sent: Monday, September 16, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi, I m running 4.3.. I have posted all the details in another threat... do you want me to copy it here? or could you see that? The subject is *spellcheck causing Core Reload to hang*. On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James james.d...@ingramcontent.comwrote: Which version of Solr are you running? (the post you replied to was about Solr 3.3, but the latest version now is 4.4.) Please provide configuration details and the query you are running that causes the problem. Also explain exactly what the problem is (query never returns?). Also explain why you have to delete the data dir when you restart. With a little background information, maybe someone can help. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Rah1x [mailto:raheel_itst...@yahoo.com] Sent: Monday, September 16, 2013 5:47 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from spellcheck.maxCollationTries... Even with str name=spellcheck.maxCollationTries1/str it hangs forewver. The only way to restart is to stop solr, delete data folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Raheel Hasan
Re: Spellcheck compounded words
I am building it on Commit.. str name=buildOnCommittrue/str Please see my other thread for all Logs and Schema + Solrconfig settings. On Mon, Sep 16, 2013 at 7:03 PM, Dyer, James james.d...@ingramcontent.comwrote: I would investigate Hoss's suggestion and look at warming queries. In some cases I've seen maxCollationTries in warming queries to cause a hang. Unless you're trying to build your spellcheck dictionary during warming, you can safely turn spellcheck off for all warming queries. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Raheel Hasan [mailto:raheelhasan@gmail.com] Sent: Monday, September 16, 2013 8:29 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi, I m running 4.3.. I have posted all the details in another threat... do you want me to copy it here? or could you see that? The subject is *spellcheck causing Core Reload to hang*. On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James james.d...@ingramcontent.comwrote: Which version of Solr are you running? (the post you replied to was about Solr 3.3, but the latest version now is 4.4.) Please provide configuration details and the query you are running that causes the problem. Also explain exactly what the problem is (query never returns?). Also explain why you have to delete the data dir when you restart. With a little background information, maybe someone can help. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Rah1x [mailto:raheel_itst...@yahoo.com] Sent: Monday, September 16, 2013 5:47 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words Hi guyz, Did anyone solve this issue? I am having it also, it took me 3 days to exactly figure it out that its coming from spellcheck.maxCollationTries... Even with str name=spellcheck.maxCollationTries1/str it hangs forewver. The only way to restart is to stop solr, delete data folder and then start solr again (i.e. index lost !). Regards, Raheel -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Raheel Hasan -- Regards, Raheel Hasan
Re: Spellcheck compounded words
O. Klein wrote: Anyways. I was testing on 3.3 and found that when I added spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to the URL there was no problem at all. Adding str name=spellcheck.maxCollations2/str str name=spellcheck.maxCollationTries2/str to the default requestHandler in solrconfig.xml caused request to hang. Can someone verify if this is a bug? I have same behaviour on different machine, with different Solr build (trunk). Tried str name=spellchecktrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.collatetrue/str int name=spellcheck.count10/int int name=spellcheck.maxCollations2/int int name=spellcheck.maxCollationTries3/int using DirectSolrSpellchecker, but only works when parameters are in HTTP request, not solrconfig.xml. Looks like bug to me. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3310851.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
I could not reproduce the problem even with the two parameters you show below added to the Default handler. I tried using this default handler with different queries with correct incorrect terms. I made sure it would sometimes successfully create collations and other times try to create collations but not find any good ones. In all cases everything worked as expected. I also checked the code to see if possibly it could create an infinite loop whereas the queries that run to check a collation's validity were in themselves getting spell corrections back. But this doesn't look like a possibility. If you are able to figure anything more out on this yourself, then please post. If this is a real bug, then we ought to get it fixed. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Wednesday, July 27, 2011 9:15 AM To: solr-user@lucene.apache.org Subject: Re: Spellcheck compounded words All the talk about logging derailed the thread. So can someone test if adding str name=spellcheck.maxCollations2/str str name=spellcheck.maxCollationTries2/str to the dedault requesthandler in solrconfig.xml using collations causes system to hang? O. Klein wrote: Anyways. I was testing on 3.3 and found that when I added spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to the URL there was no problem at all. Adding str name=spellcheck.maxCollations2/str str name=spellcheck.maxCollationTries2/str to the default requestHandler in solrconfig.xml caused request to hang. Can someone verify if this is a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3203569.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
Using ShingleFilterFactory and PositionFilterFactory I get some results, but never as a useful collation. So I tried to see what results with spellcheck.maxCollations=2 would be, but I never got this to work. not on 3.3 nor 4.0. Even lowering maxCollationEvaluations had no effect. I never get a response from Solr. Or an OOM exception. Anyone else experiencing this? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200418.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
If you're getting OOM's, double-check that you're on 3.3. There was a nasty bug in 3.0 - 3.2 that would cause OOM in conjunction with spellcheck collations in some cases. Ditto if Solr hangs as you might be in a Garbage Collection loop. If you have your jvm running with verbose gc's you'll see for sure in the server logs if this is happening. With that said, collations shouldn't cause memory problems with 3.3. Also, maxCollationEvaluations really is just to be sure the query doesn't run too long looking for spell correction possibilities. It shouldn't affect memory usage, which will be low in any case (on 3.3). (although if you are getting OOMs on 3.3 and if you're pretty sure your heap is big enough, please post a stack trace!) You might want to test some queries with all of these parameters enabled: spellcheck=true spellcheck.count=10 spellcheck.extendedResults=true spellcheck.collate=true spellcheck.collateExtendedResults=true spellcheck.maxCollationTries=10 spellcheck.maxCollations=1 ...the run some test queries and check in the spelling response. This will show you all of the invidual word possibilities and then below that you'll get a collation if it could find a combination that can return hits. Then note: - If you get nothing from spellcheck, be sure you did a spellcheck.build since the last restart (or since you committed your data). - If the correct version of one of your misspelled words isn't in the lists in the first section, try a highter spellcheck.count. However, if that word is in the index, there is no hope because Solr won't suggest a word for something in the index (but see https://issues.apache.org/jira/browse/SOLR-2585). - If you see all the corrections in the individual lists, but not in a collation, try increasing maxCollationTries and/or maxCollations and see if it suggests it. If all else fails, set maxCollationTries to zero and maxCollations to something higher. Just keep in mind that with maxCollationTries at zero, the collations aren't guaranteed to return any hits. - I'm not so sure shingles will work with the collation feature at all. - I've heard that when using shingles, you have to put the query in spellcheck.q to get it to work. But I've never used shingles with spellcheck before so I'm not sure. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Tuesday, July 26, 2011 9:07 AM To: solr-user@lucene.apache.org Subject: RE: Spellcheck compounded words Using ShingleFilterFactory and PositionFilterFactory I get some results, but never as a useful collation. So I tried to see what results with spellcheck.maxCollations=2 would be, but I never got this to work. not on 3.3 nor 4.0. Even lowering maxCollationEvaluations had no effect. I never get a response from Solr. Or an OOM exception. Anyone else experiencing this? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200418.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
Im using 4.0 for testing this. Im not sure what to expect, but as soon as I increase maxCollationTries to 1 or more, even with maxCollationEvaluations set to low value like 10 it just hangs. With maxCollationTries set to 0 it works just fine. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200846.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
It sounds like that could be a bug. Could you provide some details on how you're building your dictionary (config snippets), and what parameters you're using to query, etc. ? Your jvm settings and a rough estimate of how big your index is would be helpful too. It would be nice to try and figure out if this is a bug and if so, then try and fix it. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Tuesday, July 26, 2011 11:37 AM To: solr-user@lucene.apache.org Subject: RE: Spellcheck compounded words Im using 4.0 for testing this. Im not sure what to expect, but as soon as I increase maxCollationTries to 1 or more, even with maxCollationEvaluations set to low value like 10 it just hangs. With maxCollationTries set to 0 it works just fine. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200846.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
I will try to duplicate the behavior in 3.3 as I cant get logging to file working in 4.0 like in other releases http://globalgateway.wordpress.com/2010/01/06/configuring-solr-1-4-logging-with-log4j-in-tomcat/ Solr logging (maybe you know how to fix this?) Config is pretty normal I think: searchComponent class=solr.SpellCheckComponent name=spellcheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=namedefault/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=fieldtext_spell/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.7/str float name=thresholdTokenFrequency.001/float str name=buildOnOptimizetrue/str /lst /searchComponent fieldType name=textSpell class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwordsSpell.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200945.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
I will try to duplicate the behavior in 3.3 as I cant get logging to file working in 4.0 like in other releases http://globalgateway.wordpress.com/2010/01/06/configuring-solr-1-4-logging- with-log4j-in-tomcat/ Solr logging (maybe you know how to fix this?) You're most likely caught by the upgrade of slf4j. Check catalina.out, it'll tell you your versions are out of date or complain about a static logger binding. Config is pretty normal I think: searchComponent class=solr.SpellCheckComponent name=spellcheckComponent lst name=spellchecker str name=classnamesolr.IndexBasedSpellChecker/str str name=namedefault/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/ str str name=fieldtext_spell/str str name=spellcheckIndexDir./spellchecker/str str name=accuracy0.7/str float name=thresholdTokenFrequency.001/float str name=buildOnOptimizetrue/str /lst /searchComponent fieldType name=textSpell class=solr.TextField positionIncrementGap=100 stored=false multiValued=true analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwordsSpell.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3 200945.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
Adding log4j-1.2.16.jar and deleting slf4j-jdk14-1.6.1.jar does not fix logging for 4.0 for me. Anyways, tried it on 3.3 and Solr just hangs here also. No logging, no exceptions. I'll let you know if I manage to find source of problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201202.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
FWIW, here is the process I follow to create a log4j aware version of the apache solr war file and the corresponding lo4j.properties files. Have fun :) François ## # # Log4J configuration for SOLR # # http://wiki.apache.org/solr/SolrLogging # # # 1) Download SLF4J: # http://www.slf4j.org/ # http://www.slf4j.org/download.html # http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz # # 2) Unpack Solr: # jar xvf apache-solr-3.3.0.war # # 3) Delete: # WEB-INF/lib/log4j-over-slf4j-1.6.1.jar # WEB-INF/lib/slf4j-jdk14-1.6.1.jar # # 4) Copy: # slf4j-1.6.1/slf4j-log4j12-1.6.1.jar - WEB-INF/lib # log4j.properties (this file)- WEB-INF/classes/ (needs to be created) # # 5) Pack Solr: # jar cvf apache-solr-3.3.0.war admin favicon.ico index.jsp META-INF WEB-INF # # # Author: Francois Schiettecatte # Version:1.0 # ## ## # # Logging levels (helpful reminder) # # DEBUG INFO WARN ERROR FATAL # ## # # Logging setup # log4j.rootLogger=ERROR, SOLR # Daily Rolling File Appender (SOLR) log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender log4j.appender.SOLR.File=${catalina.base}/logs/solr.log log4j.appender.SOLR.Append=true log4j.appender.SOLR.Encoding=UTF-8 log4j.appender.SOLR.DatePattern='-'-MM-dd log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout log4j.appender.SOLR.layout.ConversionPattern=%d [%t] %-5p %c - %m%n ## # # Logging levels for SOLR # # Default logging level log4j.logger.org.apache.solr=ERROR ## On Jul 26, 2011, at 2:49 PM, O. Klein wrote: Adding log4j-1.2.16.jar and deleting slf4j-jdk14-1.6.1.jar does not fix logging for 4.0 for me. Anyways, tried it on 3.3 and Solr just hangs here also. No logging, no exceptions. I'll let you know if I manage to find source of problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201202.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
François Schiettecatte wrote: # # 4) Copy: # slf4j-1.6.1/slf4j-log4j12-1.6.1.jar - WEB-INF/lib # log4j.properties (this file)- WEB-INF/classes/ (needs to be created) # Don't you mean log4j-1.2.16/slf4j-log4j12-1.6.1.jar ? Anyways. I was testing on 3.3 and found that when I added spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to the URL there was no problem at all. Adding str name=spellcheck.maxCollations2/str str name=spellcheck.maxCollationTries2/str to the default requestHandler in solrconfig.xml caused request to hang. Can someone verify if this is a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201332.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
I get slf4j-log4j12-1.6.1.jar from http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz, it is what interfaces slf4j to log4j, you will also need to add log4j-1.2.16.jar to WEB-INF/lib. François On Jul 26, 2011, at 3:40 PM, O. Klein wrote: François Schiettecatte wrote: # # 4) Copy: #slf4j-1.6.1/slf4j-log4j12-1.6.1.jar - WEB-INF/lib #log4j.properties (this file)- WEB-INF/classes/ (needs to be created) # Don't you mean log4j-1.2.16/slf4j-log4j12-1.6.1.jar ? Anyways. I was testing on 3.3 and found that when I added spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to the URL there was no problem at all. Adding str name=spellcheck.maxCollations2/str str name=spellcheck.maxCollationTries2/str to the default requestHandler in solrconfig.xml caused request to hang. Can someone verify if this is a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201332.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Spellcheck compounded words
I'm afraid there currently isn't much support for correcting misplaced whitespace. Solr is going to look at each word individually and won't even try to combine ajacent words (or split a word into 2 or more). So there is no good way to get these kinds of suggestions. One thing that might work in some cases is to create a spelling dictionary composed of shingles (2+ words indexed together as 1 token). This approach is described in SmileyPugh's Solr book, (1st ed) p.180ff under the heading An alternative approach. I haven't tried this but it might be your best hope if this is a feature you've absolutely got to have. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Friday, July 22, 2011 8:11 PM To: solr-user@lucene.apache.org Subject: Spellcheck compounded words How do I get spellchecker to suggest compounded words? Like. q=sail booat and suggestion/collate is sailboat and sail boat -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3192748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck compounded words
This will work for mispelled compounds indeed but not when the compound word is actually queried as two separate correctly spelled words. Most likely both sail and boat exist in the index as single token. There is a work around but that's limited to a scenario where users never use more than 1 query term (or two in case of mispelled compounds). When your index has shingles and you replace the whitespace with a non-whitespace character you get a proper suggestion returned. The compound is then found as suggestion but not in the collation. When queries contain more than two terms is most likely will never work this way. The results get really strange. On Monday 25 July 2011 16:49:18 Dyer, James wrote: I'm afraid there currently isn't much support for correcting misplaced whitespace. Solr is going to look at each word individually and won't even try to combine ajacent words (or split a word into 2 or more). So there is no good way to get these kinds of suggestions. One thing that might work in some cases is to create a spelling dictionary composed of shingles (2+ words indexed together as 1 token). This approach is described in SmileyPugh's Solr book, (1st ed) p.180ff under the heading An alternative approach. I haven't tried this but it might be your best hope if this is a feature you've absolutely got to have. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Friday, July 22, 2011 8:11 PM To: solr-user@lucene.apache.org Subject: Spellcheck compounded words How do I get spellchecker to suggest compounded words? Like. q=sail booat and suggestion/collate is sailboat and sail boat -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3 192748.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Spellcheck compounded words
Related to this is this jira issue: https://issues.apache.org/jira/browse/SOLR-2585 . With this patch, Solr will consider alternatives in cases where a word is mispelled in its context, but nevertheless exists in the index and/or dictionary. This is a work-in-progress and is for trunk only, but would make for another nice incremental improvement in the spellchecker. This patch won't solve the problem at hand, but it may make the shingle workaround function in a few more cases. Of course actually developing word-break-analysis into the spellchecker would be the right solution... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, July 25, 2011 10:13 AM To: solr-user@lucene.apache.org Cc: Dyer, James Subject: Re: Spellcheck compounded words This will work for mispelled compounds indeed but not when the compound word is actually queried as two separate correctly spelled words. Most likely both sail and boat exist in the index as single token. There is a work around but that's limited to a scenario where users never use more than 1 query term (or two in case of mispelled compounds). When your index has shingles and you replace the whitespace with a non-whitespace character you get a proper suggestion returned. The compound is then found as suggestion but not in the collation. When queries contain more than two terms is most likely will never work this way. The results get really strange. On Monday 25 July 2011 16:49:18 Dyer, James wrote: I'm afraid there currently isn't much support for correcting misplaced whitespace. Solr is going to look at each word individually and won't even try to combine ajacent words (or split a word into 2 or more). So there is no good way to get these kinds of suggestions. One thing that might work in some cases is to create a spelling dictionary composed of shingles (2+ words indexed together as 1 token). This approach is described in SmileyPugh's Solr book, (1st ed) p.180ff under the heading An alternative approach. I haven't tried this but it might be your best hope if this is a feature you've absolutely got to have. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Friday, July 22, 2011 8:11 PM To: solr-user@lucene.apache.org Subject: Spellcheck compounded words How do I get spellchecker to suggest compounded words? Like. q=sail booat and suggestion/collate is sailboat and sail boat -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3 192748.html Sent from the Solr - User mailing list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350