RE: Word Break Spell Checker Implementation algorithm

2014-10-21 Thread Dyer, James
David,

I do not know of a published algorithm for this.  All it does is in the case of 
terms with 0 frequency, it checks the document frequency of the various parts 
that can be made from the terms by breaking them and/or by combining adjacent 
terms. There are tuning parameters available that let you limit how much work 
it will do to try and find a suitable replacement.  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html
 .

This of course is slower than indexing shingles as the work is done at query 
time vs index time.  But it saves the added index size and indexing time 
required to index the shingles separately.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: David Philip [mailto:davidphilipshe...@gmail.com] 
Sent: Monday, October 20, 2014 9:07 AM
To: solr-user@lucene.apache.org
Subject: Word Break Spell Checker Implementation algorithm

Hi,

Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David


Word Break Spell Checker Implementation algorithm

2014-10-20 Thread David Philip
Hi,

Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David


Re: Word Break Spell Checker Implementation algorithm

2014-10-20 Thread Ramzi Alqrainy
WordBreakSolrSpellChecker offers suggestions by combining adjacent query
terms and/or breaking terms into multiple words. It is a SpellCheckComponent
enhancement, leveraging Lucene's WordBreakSpellChecker. It can detect
spelling errors resulting from misplaced whitespace without the use of
shingle-based dictionaries and provides collation support for word-break
errors, including cases where the user has a mix of single-word spelling
errors and word-break errors in the same query. It also provides shard
support.


Here is how it might be configured in solrconfig.xml:

http://lucene.472066.n3.nabble.com/file/n4164997/Screen_Shot_2014-10-20_at_9.png
 


Some of the parameters will be familiar from the discussion of the other
spell checkers, such as name, classname, and field. New for this spell
checker is combineWords, which defines whether words should be combined in a
dictionary search (default is true); breakWords, which defines if words
should be broken during a dictionary search (default is true); and
maxChanges, an integer which defines how many times the spell checker should
check collation possibilities against the index (default is 10).
The spellchecker can be configured with a traditional checker (ie:
DirectSolrSpellChecker). The results are combined and collations can contain
a mix of corrections from both spellcheckers.

Add It to a Request Handler

Queries will be sent to a RequestHandler. If every request should generate a
suggestion, then you would add the following to the requestHandler that you
are using:

http://lucene.472066.n3.nabble.com/file/n4164997/2.png 

For more details, you can read the below tutorial 

https://cwiki.apache.org/confluence/display/solr/Spell+Checking



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Word-Break-Spell-Checker-Implementation-algorithm-tp4164955p4164997.html
Sent from the Solr - User mailing list archive at Nabble.com.