Re: Spellcheck compounded words

2013-09-16 Thread Rah1x
Hi guyz,

Did anyone solve this issue?

I am having it also, it took me 3 days to exactly figure it out that its
coming from spellcheck.maxCollationTries...

Even with str name=spellcheck.maxCollationTries1/str it hangs
forewver. The only way to restart is to stop solr, delete data folder and
then start solr again (i.e. index lost !).

Regards,
Raheel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2013-09-16 Thread Dyer, James
Which version of Solr are you running? (the post you replied to was about Solr 
3.3, but the latest version now is 4.4.)  Please provide configuration details 
and the query you are running that causes the problem.  Also explain exactly 
what the problem is (query never returns?).  Also explain why you have to 
delete the data dir when you restart.  With a little background information, 
maybe someone can help.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Rah1x [mailto:raheel_itst...@yahoo.com] 
Sent: Monday, September 16, 2013 5:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck compounded words

Hi guyz,

Did anyone solve this issue?

I am having it also, it took me 3 days to exactly figure it out that its
coming from spellcheck.maxCollationTries...

Even with str name=spellcheck.maxCollationTries1/str it hangs
forewver. The only way to restart is to stop solr, delete data folder and
then start solr again (i.e. index lost !).

Regards,
Raheel



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Spellcheck compounded words

2013-09-16 Thread Raheel Hasan
Hi,

I m running 4.3..

I have posted all the details in another threat... do you want me to copy
it here? or could you see that? The subject is *spellcheck causing Core
Reload to hang*.




On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Which version of Solr are you running? (the post you replied to was about
 Solr 3.3, but the latest version now is 4.4.)  Please provide configuration
 details and the query you are running that causes the problem.  Also
 explain exactly what the problem is (query never returns?).  Also explain
 why you have to delete the data dir when you restart.  With a little
 background information, maybe someone can help.

 James Dyer
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: Rah1x [mailto:raheel_itst...@yahoo.com]
 Sent: Monday, September 16, 2013 5:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck compounded words

 Hi guyz,

 Did anyone solve this issue?

 I am having it also, it took me 3 days to exactly figure it out that its
 coming from spellcheck.maxCollationTries...

 Even with str name=spellcheck.maxCollationTries1/str it hangs
 forewver. The only way to restart is to stop solr, delete data folder and
 then start solr again (i.e. index lost !).

 Regards,
 Raheel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Regards,
Raheel Hasan


RE: Spellcheck compounded words

2013-09-16 Thread Dyer, James
I would investigate Hoss's suggestion and look at warming queries.  In some 
cases I've seen maxCollationTries in warming queries to cause a hang.  Unless 
you're trying to build your spellcheck dictionary during warming, you can 
safely turn spellcheck off for all warming queries.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Raheel Hasan [mailto:raheelhasan@gmail.com] 
Sent: Monday, September 16, 2013 8:29 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck compounded words

Hi,

I m running 4.3..

I have posted all the details in another threat... do you want me to copy
it here? or could you see that? The subject is *spellcheck causing Core
Reload to hang*.




On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Which version of Solr are you running? (the post you replied to was about
 Solr 3.3, but the latest version now is 4.4.)  Please provide configuration
 details and the query you are running that causes the problem.  Also
 explain exactly what the problem is (query never returns?).  Also explain
 why you have to delete the data dir when you restart.  With a little
 background information, maybe someone can help.

 James Dyer
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: Rah1x [mailto:raheel_itst...@yahoo.com]
 Sent: Monday, September 16, 2013 5:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck compounded words

 Hi guyz,

 Did anyone solve this issue?

 I am having it also, it took me 3 days to exactly figure it out that its
 coming from spellcheck.maxCollationTries...

 Even with str name=spellcheck.maxCollationTries1/str it hangs
 forewver. The only way to restart is to stop solr, delete data folder and
 then start solr again (i.e. index lost !).

 Regards,
 Raheel



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Regards,
Raheel Hasan



Re: Spellcheck compounded words

2013-09-16 Thread Raheel Hasan
I am building it on Commit..
str name=buildOnCommittrue/str

Please see my other thread for all Logs and Schema + Solrconfig settings.


On Mon, Sep 16, 2013 at 7:03 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 I would investigate Hoss's suggestion and look at warming queries.  In
 some cases I've seen maxCollationTries in warming queries to cause a
 hang.  Unless you're trying to build your spellcheck dictionary during
 warming, you can safely turn spellcheck off for all warming queries.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: Monday, September 16, 2013 8:29 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck compounded words

 Hi,

 I m running 4.3..

 I have posted all the details in another threat... do you want me to copy
 it here? or could you see that? The subject is *spellcheck causing Core
 Reload to hang*.




 On Mon, Sep 16, 2013 at 5:50 PM, Dyer, James
 james.d...@ingramcontent.comwrote:

  Which version of Solr are you running? (the post you replied to was about
  Solr 3.3, but the latest version now is 4.4.)  Please provide
 configuration
  details and the query you are running that causes the problem.  Also
  explain exactly what the problem is (query never returns?).  Also explain
  why you have to delete the data dir when you restart.  With a little
  background information, maybe someone can help.
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
  -Original Message-
  From: Rah1x [mailto:raheel_itst...@yahoo.com]
  Sent: Monday, September 16, 2013 5:47 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Spellcheck compounded words
 
  Hi guyz,
 
  Did anyone solve this issue?
 
  I am having it also, it took me 3 days to exactly figure it out that its
  coming from spellcheck.maxCollationTries...
 
  Even with str name=spellcheck.maxCollationTries1/str it hangs
  forewver. The only way to restart is to stop solr, delete data folder
 and
  then start solr again (i.e. index lost !).
 
  Regards,
  Raheel
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p4090320.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 


 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan


Re: Spellcheck compounded words

2011-09-05 Thread O. Klein

O. Klein wrote:
 
 Anyways. I was testing on 3.3 and found that when I added
 spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters
 to the URL there was no problem at all.
 
 Adding 
 
   str name=spellcheck.maxCollations2/str
   str name=spellcheck.maxCollationTries2/str
 
 to the default requestHandler in solrconfig.xml caused request to hang.
 
 Can someone verify if this is a bug?
 

I have same behaviour on different machine, with different Solr build
(trunk).

Tried
str name=spellchecktrue/str
   str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.collatetrue/str
int name=spellcheck.count10/int
int name=spellcheck.maxCollations2/int
int name=spellcheck.maxCollationTries3/int

using DirectSolrSpellchecker, but only works when parameters are in HTTP
request, not solrconfig.xml.

Looks like bug to me.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3310851.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2011-07-27 Thread Dyer, James
I could not reproduce the problem even with the two parameters you show below 
added to the Default handler.  I tried using this default handler with 
different queries with correct  incorrect terms.  I made sure it would 
sometimes successfully create collations and other times try to create 
collations but not find any good ones.  In all cases everything worked as 
expected.

I also checked the code to see if possibly it could create an infinite loop 
whereas the queries that run to check a collation's validity were in themselves 
getting spell corrections back.  But this doesn't look like a possibility.  

If you are able to figure anything more out on this yourself, then please post. 
 If this is a real bug, then we ought to get it fixed.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Wednesday, July 27, 2011 9:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck compounded words

All the talk about logging derailed the thread. So can someone test if adding 

  str name=spellcheck.maxCollations2/str
  str name=spellcheck.maxCollationTries2/str

to the dedault requesthandler in solrconfig.xml using collations causes
system to hang?


O. Klein wrote:
 
 Anyways. I was testing on 3.3 and found that when I added
 spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters
 to the URL there was no problem at all.
 
 Adding 
 
   str name=spellcheck.maxCollations2/str
   str name=spellcheck.maxCollationTries2/str
 
 to the default requestHandler in solrconfig.xml caused request to hang.
 
 Can someone verify if this is a bug?
 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3203569.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2011-07-26 Thread O. Klein
Using ShingleFilterFactory and PositionFilterFactory I get some results, but
never as a useful collation.

So I tried to see what results with spellcheck.maxCollations=2 would be, but
I never got this to work. not on 3.3 nor 4.0. Even lowering
maxCollationEvaluations had no effect. I never get a response from Solr. Or
an OOM exception.

Anyone else experiencing this?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200418.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2011-07-26 Thread Dyer, James
If you're getting OOM's, double-check that you're on 3.3.  There was a nasty 
bug in 3.0 - 3.2 that would cause OOM in conjunction with spellcheck collations 
in some cases.  Ditto if Solr hangs as you might be in a Garbage Collection 
loop.  If you have your jvm running with verbose gc's you'll see for sure in 
the server logs if this is happening.

With that said, collations shouldn't cause memory problems with 3.3.  Also, 
maxCollationEvaluations really is just to be sure the query doesn't run too 
long looking for spell correction possibilities.  It shouldn't affect memory 
usage, which will be low in any case (on 3.3).  

(although if you are getting OOMs on 3.3 and if you're pretty sure your heap is 
big enough, please post a stack trace!)

You might want to test some queries with all of these parameters enabled:

spellcheck=true
spellcheck.count=10
spellcheck.extendedResults=true
spellcheck.collate=true
spellcheck.collateExtendedResults=true
spellcheck.maxCollationTries=10
spellcheck.maxCollations=1

...the run some test queries and check in the spelling response.  This will 
show you all of the invidual word possibilities and then below that you'll get 
a collation if it could find a combination that can return hits.  Then note:

- If you get nothing from spellcheck, be sure you did a spellcheck.build 
since the last restart (or since you committed your data).

- If the correct version of one of your misspelled words isn't in the lists 
in the first section, try a highter spellcheck.count.  However, if that word 
is in the index, there is no hope because Solr won't suggest a word for 
something in the index (but see 
https://issues.apache.org/jira/browse/SOLR-2585).

- If you see all the corrections in the individual lists, but not in a 
collation, try increasing maxCollationTries and/or maxCollations and see if 
it suggests it.  If all else fails, set maxCollationTries to zero and 
maxCollations to something higher.  Just keep in mind that with 
maxCollationTries at zero, the collations aren't guaranteed to return any 
hits.

- I'm not so sure shingles will work with the collation feature at all.

- I've heard that when using shingles, you have to put the query in 
spellcheck.q to get it to work.  But I've never used shingles with spellcheck 
before so I'm not sure.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, July 26, 2011 9:07 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck compounded words

Using ShingleFilterFactory and PositionFilterFactory I get some results, but
never as a useful collation.

So I tried to see what results with spellcheck.maxCollations=2 would be, but
I never got this to work. not on 3.3 nor 4.0. Even lowering
maxCollationEvaluations had no effect. I never get a response from Solr. Or
an OOM exception.

Anyone else experiencing this?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200418.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2011-07-26 Thread O. Klein
Im using 4.0 for testing this.

Im not sure what to expect, but as soon as I increase maxCollationTries to 1
or more, even with maxCollationEvaluations set to low value like 10 it just
hangs.

With maxCollationTries set to 0 it works just fine.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200846.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2011-07-26 Thread Dyer, James
It sounds like that could be a bug.  Could you provide some details on how 
you're building your dictionary (config snippets), and what parameters you're 
using to query, etc. ?  Your jvm settings and a rough estimate of how big your 
index is would be helpful too.  It would be nice to try and figure out if this 
is a bug and if so, then try and fix it.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, July 26, 2011 11:37 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck compounded words

Im using 4.0 for testing this.

Im not sure what to expect, but as soon as I increase maxCollationTries to 1
or more, even with maxCollationEvaluations set to low value like 10 it just
hangs.

With maxCollationTries set to 0 it works just fine.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200846.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck compounded words

2011-07-26 Thread O. Klein
I will try to duplicate the behavior in 3.3 as I cant get logging to file
working in 4.0 like in other releases 
http://globalgateway.wordpress.com/2010/01/06/configuring-solr-1-4-logging-with-log4j-in-tomcat/
Solr logging  (maybe you know how to fix this?)

Config is pretty normal I think:

  searchComponent class=solr.SpellCheckComponent
name=spellcheckComponent

   lst name=spellchecker
  str name=classnamesolr.IndexBasedSpellChecker/str
  str name=namedefault/str
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=fieldtext_spell/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=accuracy0.7/str
  float name=thresholdTokenFrequency.001/float 
  str name=buildOnOptimizetrue/str
/lst
  /searchComponent

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
analyzer
  tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwordsSpell.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3200945.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck compounded words

2011-07-26 Thread Markus Jelsma

 I will try to duplicate the behavior in 3.3 as I cant get logging to file
 working in 4.0 like in other releases
 http://globalgateway.wordpress.com/2010/01/06/configuring-solr-1-4-logging-
 with-log4j-in-tomcat/ Solr logging  (maybe you know how to fix this?)

You're  most likely caught by the upgrade of slf4j. Check catalina.out, it'll 
tell you your versions are out of date or complain about a static logger 
binding.

 
 Config is pretty normal I think:
 
   searchComponent class=solr.SpellCheckComponent
 name=spellcheckComponent
 
lst name=spellchecker
   str name=classnamesolr.IndexBasedSpellChecker/str
   str name=namedefault/str
   str
 name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/
 str str name=fieldtext_spell/str
   str name=spellcheckIndexDir./spellchecker/str
   str name=accuracy0.7/str
   float name=thresholdTokenFrequency.001/float
   str name=buildOnOptimizetrue/str
 /lst
   /searchComponent
 
 fieldType name=textSpell class=solr.TextField
 positionIncrementGap=100 stored=false multiValued=true
 analyzer
   tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwordsSpell.txt/
 filter class=solr.StandardFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3
 200945.html Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck compounded words

2011-07-26 Thread O. Klein
Adding log4j-1.2.16.jar and deleting slf4j-jdk14-1.6.1.jar does not fix
logging for 4.0 for me.

Anyways, tried it on 3.3 and Solr just hangs here also. No logging, no
exceptions.

I'll let you know if I manage to find source of problem.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201202.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck compounded words

2011-07-26 Thread François Schiettecatte
FWIW, here is the process I follow to create a log4j aware version of the 
apache solr war file and the corresponding lo4j.properties files.

Have fun :)

François


##
#
# Log4J configuration for SOLR
#
#   http://wiki.apache.org/solr/SolrLogging
#
#
# 1) Download SLF4J:
#   http://www.slf4j.org/
#   http://www.slf4j.org/download.html
#   http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz
#
# 2) Unpack Solr:
#   jar xvf apache-solr-3.3.0.war
#
# 3) Delete:
#   WEB-INF/lib/log4j-over-slf4j-1.6.1.jar
#   WEB-INF/lib/slf4j-jdk14-1.6.1.jar
#
# 4) Copy:
#   slf4j-1.6.1/slf4j-log4j12-1.6.1.jar -  
WEB-INF/lib
#   log4j.properties (this file)-  
WEB-INF/classes/ (needs to be created)
#
# 5) Pack Solr:
#   jar cvf apache-solr-3.3.0.war admin favicon.ico index.jsp 
META-INF WEB-INF
#
#
#   Author: Francois Schiettecatte
#   Version:1.0
#
##



##
#
# Logging levels (helpful reminder)
#
# DEBUG  INFO  WARN  ERROR  FATAL
#



##
#
# Logging setup
#

log4j.rootLogger=ERROR, SOLR


# Daily Rolling File Appender (SOLR)
log4j.appender.SOLR=org.apache.log4j.DailyRollingFileAppender
log4j.appender.SOLR.File=${catalina.base}/logs/solr.log
log4j.appender.SOLR.Append=true
log4j.appender.SOLR.Encoding=UTF-8
log4j.appender.SOLR.DatePattern='-'-MM-dd
log4j.appender.SOLR.layout=org.apache.log4j.PatternLayout
log4j.appender.SOLR.layout.ConversionPattern=%d [%t] %-5p %c - %m%n



##
#
# Logging levels for SOLR
#

# Default logging level
log4j.logger.org.apache.solr=ERROR



##




On Jul 26, 2011, at 2:49 PM, O. Klein wrote:

 Adding log4j-1.2.16.jar and deleting slf4j-jdk14-1.6.1.jar does not fix
 logging for 4.0 for me.
 
 Anyways, tried it on 3.3 and Solr just hangs here also. No logging, no
 exceptions.
 
 I'll let you know if I manage to find source of problem.
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201202.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Spellcheck compounded words

2011-07-26 Thread O. Klein

François Schiettecatte wrote:
 
 #
 # 4) Copy:
 # slf4j-1.6.1/slf4j-log4j12-1.6.1.jar -  
 WEB-INF/lib
 # log4j.properties (this file)-  
 WEB-INF/classes/ (needs to be
 created)
 #
 

Don't you mean log4j-1.2.16/slf4j-log4j12-1.6.1.jar ?

Anyways. I was testing on 3.3 and found that when I added
spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to
the URL there was no problem at all.

Adding 

  str name=spellcheck.maxCollations2/str
  str name=spellcheck.maxCollationTries2/str

to the default requestHandler in solrconfig.xml caused request to hang.

Can someone verify if this is a bug?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201332.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck compounded words

2011-07-26 Thread François Schiettecatte
I get slf4j-log4j12-1.6.1.jar from 
http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz, it is what interfaces  slf4j to 
log4j, you will also need to add log4j-1.2.16.jar to WEB-INF/lib.


François 


On Jul 26, 2011, at 3:40 PM, O. Klein wrote:

 
 François Schiettecatte wrote:
 
 #
 # 4) Copy:
 #slf4j-1.6.1/slf4j-log4j12-1.6.1.jar -  
 WEB-INF/lib
 #log4j.properties (this file)-  
 WEB-INF/classes/ (needs to be
 created)
 #
 
 
 Don't you mean log4j-1.2.16/slf4j-log4j12-1.6.1.jar ?
 
 Anyways. I was testing on 3.3 and found that when I added
 spellcheck.maxCollations=2spellcheck.maxCollationTries=2 as parameters to
 the URL there was no problem at all.
 
 Adding 
 
  str name=spellcheck.maxCollations2/str
  str name=spellcheck.maxCollationTries2/str
 
 to the default requestHandler in solrconfig.xml caused request to hang.
 
 Can someone verify if this is a bug?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3201332.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Spellcheck compounded words

2011-07-25 Thread Dyer, James
I'm afraid there currently isn't much support for correcting misplaced 
whitespace.  Solr is going to look at each word individually and won't even try 
to combine ajacent words (or split a word into 2 or more).  So there is no good 
way to get these kinds of suggestions.

One thing that might work in some cases is to create a spelling dictionary 
composed of shingles (2+ words indexed together as 1 token).  This approach is 
described in SmileyPugh's Solr book, (1st ed) p.180ff under the heading An 
alternative approach.  I haven't tried this but it might be your best hope if 
this is a feature you've absolutely got to have.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Friday, July 22, 2011 8:11 PM
To: solr-user@lucene.apache.org
Subject: Spellcheck compounded words

How do I get spellchecker to suggest compounded words?

Like. q=sail booat

and suggestion/collate is sailboat and sail boat

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3192748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck compounded words

2011-07-25 Thread Markus Jelsma
This will work for mispelled compounds indeed but not when the compound word 
is actually queried as two separate correctly spelled words. Most likely both 
sail and boat exist in the index as single token.

There is a work around but that's limited to a scenario where users never use 
more than 1 query term (or two in case of mispelled compounds). When your 
index has shingles and you replace the whitespace with a non-whitespace 
character you get a proper suggestion returned. The compound is then found as 
suggestion but not in the collation.

When queries contain more than two terms is most likely will never work this 
way. The results get really strange.

On Monday 25 July 2011 16:49:18 Dyer, James wrote:
 I'm afraid there currently isn't much support for correcting misplaced
 whitespace.  Solr is going to look at each word individually and won't
 even try to combine ajacent words (or split a word into 2 or more).  So
 there is no good way to get these kinds of suggestions.
 
 One thing that might work in some cases is to create a spelling dictionary
 composed of shingles (2+ words indexed together as 1 token).  This
 approach is described in SmileyPugh's Solr book, (1st ed) p.180ff under
 the heading An alternative approach.  I haven't tried this but it might
 be your best hope if this is a feature you've absolutely got to have.
 
 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: O. Klein [mailto:kl...@octoweb.nl]
 Sent: Friday, July 22, 2011 8:11 PM
 To: solr-user@lucene.apache.org
 Subject: Spellcheck compounded words
 
 How do I get spellchecker to suggest compounded words?
 
 Like. q=sail booat
 
 and suggestion/collate is sailboat and sail boat
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3
 192748.html Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


RE: Spellcheck compounded words

2011-07-25 Thread Dyer, James
Related to this is this jira issue: 
https://issues.apache.org/jira/browse/SOLR-2585 . With this patch, Solr will 
consider alternatives in cases where a word is mispelled in its context, but 
nevertheless exists in the index and/or dictionary.  This is a work-in-progress 
and is for trunk only, but would make for another nice incremental improvement 
in the spellchecker.

This patch won't solve the problem at hand, but it may make the shingle 
workaround function in a few more cases.  Of course actually developing 
word-break-analysis into the spellchecker would be the right solution...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Monday, July 25, 2011 10:13 AM
To: solr-user@lucene.apache.org
Cc: Dyer, James
Subject: Re: Spellcheck compounded words

This will work for mispelled compounds indeed but not when the compound word 
is actually queried as two separate correctly spelled words. Most likely both 
sail and boat exist in the index as single token.

There is a work around but that's limited to a scenario where users never use 
more than 1 query term (or two in case of mispelled compounds). When your 
index has shingles and you replace the whitespace with a non-whitespace 
character you get a proper suggestion returned. The compound is then found as 
suggestion but not in the collation.

When queries contain more than two terms is most likely will never work this 
way. The results get really strange.

On Monday 25 July 2011 16:49:18 Dyer, James wrote:
 I'm afraid there currently isn't much support for correcting misplaced
 whitespace.  Solr is going to look at each word individually and won't
 even try to combine ajacent words (or split a word into 2 or more).  So
 there is no good way to get these kinds of suggestions.
 
 One thing that might work in some cases is to create a spelling dictionary
 composed of shingles (2+ words indexed together as 1 token).  This
 approach is described in SmileyPugh's Solr book, (1st ed) p.180ff under
 the heading An alternative approach.  I haven't tried this but it might
 be your best hope if this is a feature you've absolutely got to have.
 
 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: O. Klein [mailto:kl...@octoweb.nl]
 Sent: Friday, July 22, 2011 8:11 PM
 To: solr-user@lucene.apache.org
 Subject: Spellcheck compounded words
 
 How do I get spellchecker to suggest compounded words?
 
 Like. q=sail booat
 
 and suggestion/collate is sailboat and sail boat
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Spellcheck-compounded-words-tp3192748p3
 192748.html Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350