A problem is that your profanity list will not stop growing, and with
each new word you will want to rescrub the index.
We had a thousand-word NOT clause in every query (a filter query would
be true for 99% of the index) until we switched to another
arrangement.
Another small problem was that I
On Thu, Feb 11, 2010 at 10:49 AM, Grant Ingersoll gsing...@apache.org wrote:
Otherwise, I'd do it via copy fields. Your first field is your main field
and is analyzed as before. Your second field does the profanity detection
and simply outputs a single token at the end, safe/unsafe.
How
- A TokenFilter would allow me to tap into the existing analysis pipeline so
I get the tokens for free but I can't access the document.
https://issues.apache.org/jira/browse/SOLR-1536
On Fri, Jan 29, 2010 at 12:46 AM, Mike Perham mper...@onespot.com wrote:
We'd like to implement a profanity
On Jan 28, 2010, at 4:46 PM, Mike Perham wrote:
We'd like to implement a profanity detector for documents during indexing.
That is, given a file of profane words, we'd like to be able to mark a
document as safe or not safe if it contains any of those words so that we
can have something
To: solr-user@lucene.apache.org
Sent: Thu, January 28, 2010 4:46:54 PM
Subject: implementing profanity detector
We'd like to implement a profanity detector for documents during indexing.
That is, given a file of profane words, we'd like to be able to mark a
document as safe or not safe
We'd like to implement a profanity detector for documents during indexing.
That is, given a file of profane words, we'd like to be able to mark a
document as safe or not safe if it contains any of those words so that we
can have something similar to google's safe search.
I'm trying to figure out
-user@lucene.apache.org
Sent: Thu, January 28, 2010 4:46:54 PM
Subject: implementing profanity detector
We'd like to implement a profanity detector for documents during indexing.
That is, given a file of profane words, we'd like to be able to mark a
document as safe or not safe if it contains any
mper...@onespot.com
To: solr-user@lucene.apache.org
Sent: Thu, January 28, 2010 4:46:54 PM
Subject: implementing profanity detector
We'd like to implement a profanity detector for documents during indexing.
That is, given a file of profane words, we'd like to be able to mark a
document