Just like Shane, I have also considered developing a
filtered search engine - one that is child safe.

Please let me know if this is possible:

1) add all sites appearing in the Open Directory adult
categories to a "do not index list"

2) use filter/stop words to remove most profanity from
the index 

(I think there is a workaround: people can use quotes
around words search past filter words in the Nutch)

One final question: Is stemming available in Nutch?

There are instances where this can be a good thing or
a problem.  An example is the common last name
"Sexton",  if sex was a filter word, would that name
be filtered out of the index?

Just curious. I would rather develop an algorithm for
scoring the content of a webpage. I know that not all
use of the word "sex" is pornographic.

Thanks,
Barry Bowen
580-916-0339 


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to