Is there an easy way to categorize content on parse? I have an extensive list of adult terms and i would like to update meta info on the page if the combination of terms exist to flag it as adult content so i can exclude it from the search results unless people opt in.
I'd like to also look at bayesian filtering during the parse phase to look for hidden font (text same color as background) and spammy pages or for sites with 3+ adsense ads or other particulars and score appropriately. Has anyone experiemented with this? ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
