Re: implementing profanity detector

2010-02-16 Thread Lance Norskog
A problem is that your profanity list will not stop growing, and with each new word you will want to rescrub the index. We had a thousand-word NOT clause in every query (a filter query would be true for 99% of the index) until we switched to another arrangement. Another small problem was that I

Re: implementing profanity detector

2010-02-12 Thread Mike Perham
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll gsing...@apache.org wrote: Otherwise, I'd do it via copy fields.  Your first field is your main field and is analyzed as before.  Your second field does the profanity detection and simply outputs a single token at the end, safe/unsafe. How

Re: implementing profanity detector

2010-02-11 Thread Alexey Serba
- A TokenFilter would allow me to tap into the existing analysis pipeline so I get the tokens for free but I can't access the document. https://issues.apache.org/jira/browse/SOLR-1536 On Fri, Jan 29, 2010 at 12:46 AM, Mike Perham mper...@onespot.com wrote: We'd like to implement a profanity

Re: implementing profanity detector

2010-02-11 Thread Grant Ingersoll
On Jan 28, 2010, at 4:46 PM, Mike Perham wrote: We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any of those words so that we can have something

implementing profanity detector

2010-02-10 Thread Mike Perham
To: solr-user@lucene.apache.org Sent: Thu, January 28, 2010 4:46:54 PM Subject: implementing profanity detector We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe

implementing profanity detector

2010-01-28 Thread Mike Perham
We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any of those words so that we can have something similar to google's safe search. I'm trying to figure out

Re: implementing profanity detector

2010-01-28 Thread Otis Gospodnetic
-user@lucene.apache.org Sent: Thu, January 28, 2010 4:46:54 PM Subject: implementing profanity detector We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document as safe or not safe if it contains any

Re: implementing profanity detector

2010-01-28 Thread Lance Norskog
mper...@onespot.com To: solr-user@lucene.apache.org Sent: Thu, January 28, 2010 4:46:54 PM Subject: implementing profanity detector We'd like to implement a profanity detector for documents during indexing. That is, given a file of profane words, we'd like to be able to mark a document