What is the output printed by the inject and fetchlist generation?
If you're using the 'crawl' command then this will be the first few hundred lines in the output. Please submit a bug report and attach your log file to it.
Thanks,
Doug
Ryu Ishimoto wrote:
Hi I am attempting to run Nutch to create a search engine for kids and teens(using dmoz for kids and teens), and I would appreciate a few tips since I have just began looking at Nutch.
I added +^http://[a-z0-9]*\.)*domain.com/ for each domain in the kids and teens dmoz file in regex_urlfilter.txt. When I fetch, however, I'm not getting any pages(only 23 in the 2nd iteration of fetching). Am I filtering wrong? What would anyone recommend if I want to include all the pages within only the kids/teens domains(about 10,000) and exclude everything else? thanks in advance.
ryu
------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
------------------------------------------------------- This SF.Net email is sponsored by OSTG. Have you noticed the changes on Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, one more big change to announce. We are now OSTG- Open Source Technology Group. Come see the changes on the new OSTG site. www.ostg.com _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
