On Wed, Oct 10, 2012 at 9:02 AM, O. Klein <kl...@octoweb.nl> wrote:
> I don't want to tweak the threshold. For majority of cases it works fine.
>
> It's for cases where term has low frequency but is spelled correctly.
>
> If you lower the threshold you would also get incorrect spelled terms as
> suggestions.
>

Yeah there is no real magic here when the corpus contains typos. this
existing docFreq heuristic was just borrowed from the old index-based
spellchecker.

I do wonder if using # of occurrences (totalTermFreq) instead of # of
documents with the term (docFreq) would improve the heuristic.

In all cases I think if you want to also integrate a dictionary or
something, it seems like this could somehow be done with the
File-based spellchecker?

Reply via email to