On Wed, Oct 10, 2012 at 9:02 AM, O. Klein <kl...@octoweb.nl> wrote: > I don't want to tweak the threshold. For majority of cases it works fine. > > It's for cases where term has low frequency but is spelled correctly. > > If you lower the threshold you would also get incorrect spelled terms as > suggestions. >
Yeah there is no real magic here when the corpus contains typos. this existing docFreq heuristic was just borrowed from the old index-based spellchecker. I do wonder if using # of occurrences (totalTermFreq) instead of # of documents with the term (docFreq) would improve the heuristic. In all cases I think if you want to also integrate a dictionary or something, it seems like this could somehow be done with the File-based spellchecker?