David,

I do not know of a published algorithm for this.  All it does is in the case of 
terms with 0 frequency, it checks the document frequency of the various parts 
that can be made from the terms by breaking them and/or by combining adjacent 
terms. There are tuning parameters available that let you limit how much work 
it will do to try and find a suitable replacement.  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html
 .

This of course is slower than indexing shingles as the work is done at query 
time vs index time.  But it saves the added index size and indexing time 
required to index the shingles separately.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: David Philip [mailto:davidphilipshe...@gmail.com] 
Sent: Monday, October 20, 2014 9:07 AM
To: solr-user@lucene.apache.org
Subject: Word Break Spell Checker Implementation algorithm

Hi,

    Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David

Reply via email to