Hi Julien, thanks a lot for the hint, already looking into it ;-).
Cheers, Milan On Wed, 2007-11-07 at 14:36 +0000, DigitalPebble wrote: > Hi Milan, > > We have developed a Nutch plugin which could be used for that and uses our > text classification library. The plugin consists in a Nutch Indexer which > creates a special field for the documents and a searcher which allows you to > switch the filter on. > We have used it for classifying spam on forums but I am sure that this > should work on porn just as well. You can find more details on our Text > Classification API on http://www.digitalpebble.com/solutionsTC.html. The > Nutch plugin is just a wrapper for that library. > > Best, > > Julien > -------- Original Message -------- Subject: SaveSearch or Adult FilterDate: Wed, 07 Nov 2007 14:24:37 +0000From: Milan Krendzelak <[EMAIL PROTECTED]>Reply-To: [EMAIL PROTECTED]: [email protected] Hi, does somebody have any idea how to implement save search in Nutch. I think will be cool to use Bayesian technique to classify the web site as adult (porno) and store flag in index. Of cause some other technique could be used as: regex, black list etc etc... Cheers, Milan Krendzelak Senior Software Developer
