Hi, I have a very large index, and I'm trying to add a spell checker for it. I don't want to copy all text in index to extra spell field, since that would be prohibitively big, and index is already close to how big it can reasonably be, so I just want to extract word frequencies as I index for offline processing.
After some filtering I get something like this (word, frequency): a 122958495 aa 834203 aaa 175206 aaaa 22389 aaab 1522 aaai 1050 aaas 6384 aab 8109 aabb 1906 aac 35100 aacc 1692 aachen 11723 I wanted to use FileBasedSpellChecker, but it doesn't support frequencies, so its recommendations are consistently horrible. Increasing frequency cutoff won't really help that much - it will still suggest less frequent words over equally similar more frequent words. What's the easiest way to get this working? Presumably I'd need to create a separate index with just these words. How do I get frequencies there, without actually creating 11723 records with "aachen" in them etc.? I can do some small Java coding if need be. I'm already using 3.x branch (mostly for edismax, plus some unrelated minor patches). Thanks, Tomasz