Is there an easy way to categorize content on parse? 
I have an extensive list of adult terms and i would
like to update meta info on the page if the
combination of terms exist to flag it as adult content
so i can exclude it from the search results unless
people opt in.

I'd like to also look at bayesian filtering during the
parse phase to look for hidden font (text same color
as background) and spammy pages or for sites with 3+
adsense ads or other particulars and score
appropriately.

Has anyone experiemented with this?


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to