Filtering based on keywords in URL is quite easy and doesn't
require source changes. Just edit the conf/regex-urlfilter.txt
file. If you want to filter for "puppies", you would have an
entry like:

+.*puppies

# Skip everything else
-.

Just boosting a URL so that it gets preferential fetch preference
would probably require source changes though. The place
to start would probably be in FetchListTool.java which contains
the main for the "bin/nutch generate" step.

Howie


I have been looking at the nutch sources to see how to modify it such
that nutch will crawl only links with certain keywords but to no avail.


Can anyone please help me out. Even just pointing me to the right
classes to start with would be of great help.

I was told there is a nutch book that I can buy. Would this book be of
help? Would you recommend it for my purpose?

Thanks
Jason

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to