A problem is that your profanity list will not stop growing, and with each new word you will want to rescrub the index.
We had a thousand-word NOT clause in every query (a filter query would be true for 99% of the index) until we switched to another arrangement. Another small problem was that I knew of many more perversions than my co-workers, but did not wish to display my vast erudition in the seamier side of life :) On Fri, Feb 12, 2010 at 4:26 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : Otherwise, I'd do it via copy fields. Your first field is your main > : field and is analyzed as before. Your second field does the profanity > : detection and simply outputs a single token at the end, safe/unsafe. > > you don't even need custom code for this ... copyFiled all your text into > a 'has_profanity' field where you use a suitable Tokenizer followed by the > KeepWordsTokenFilter that only keeps profane words and then a > PatternReplaceTokenFilter that matches .* and replaces it with "HELL_YEA" > ... now a search for "is_profane:HELL_YEA" finds all profane docs, with > the added bonus that the scores are based on how many profane words occur > in the doc. > > it could be used as a filter query (probably negated) as needed. > > > > -Hoss > > -- Lance Norskog goks...@gmail.com