David, I do not know of a published algorithm for this. All it does is in the case of terms with 0 frequency, it checks the document frequency of the various parts that can be made from the terms by breaking them and/or by combining adjacent terms. There are tuning parameters available that let you limit how much work it will do to try and find a suitable replacement. See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html .
This of course is slower than indexing shingles as the work is done at query time vs index time. But it saves the added index size and indexing time required to index the shingles separately. James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: David Philip [mailto:davidphilipshe...@gmail.com] Sent: Monday, October 20, 2014 9:07 AM To: solr-user@lucene.apache.org Subject: Word Break Spell Checker Implementation algorithm Hi, Could you please point me to the link where I can learn about the theory behind the implementation of word break spell checker? Like we know that the solr's DirectSolrSpellCheck component uses levenstian distance algorithm, what is the algorithm used behind the word break spell checker component? How does it detects the space that is needed if it doesn't use shingle? Thanks - David