FYI this does not work. It appears that the update seems to run on a different thread to the analysis, perhaps because the update is done when the commit happens? I'm sending the document XML with commitWithin="60000".
I would appreciate any other ideas. I'm drawing a blank on how to implement this efficiently with Lucene/Solr. mike On Thu, Jan 28, 2010 at 4:31 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > > How about this crazy idea - a custom TokenFilter that stores the safe flag in > ThreadLocal? > > > > ----- Original Message ---- > > From: Mike Perham <mper...@onespot.com> > > To: solr-user@lucene.apache.org > > Sent: Thu, January 28, 2010 4:46:54 PM > > Subject: implementing profanity detector > > > > We'd like to implement a profanity detector for documents during indexing. > > That is, given a file of profane words, we'd like to be able to mark a > > document as safe or not safe if it contains any of those words so that we > > can have something similar to google's safe search. > > > > I'm trying to figure out how best to implement this with Solr 1.4: > > > > - An UpdateRequestProcessor would allow me to dynamically populate a "safe" > > boolean field but requires me to pull out the content, tokenize it and run > > each token through my set of profanities, essentially running the analysis > > pipeline again. That's a lot of overheard AFAIK. > > > > - A TokenFilter would allow me to tap into the existing analysis pipeline so > > I get the tokens for free but I can't access the document. > > > > Any suggestions on how to best implement this? > > > > Thanks in advance, > > mike >