Hi I am attempting to run Nutch to create a search engine for kids and teens(using 
dmoz for kids and teens), and I would appreciate a few tips since I have just began 
looking at Nutch.  

I added +^http://[a-z0-9]*\.)*domain.com/ for each domain in the kids and teens dmoz 
file in regex_urlfilter.txt.  When I fetch, however, I'm not getting any pages(only 23 
in the 2nd iteration of fetching).  Am I filtering wrong?  What would anyone recommend 
if I want to include all the pages within only the kids/teens domains(about 10,000) 
and exclude everything else?  thanks in advance.


ryu




-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to