Filtering based on keywords in URL is quite easy and doesn't require source changes. Just edit the conf/regex-urlfilter.txt file. If you want to filter for "puppies", you would have an entry like:
+.*puppies # Skip everything else -. Just boosting a URL so that it gets preferential fetch preference would probably require source changes though. The place to start would probably be in FetchListTool.java which contains the main for the "bin/nutch generate" step. Howie
I have been looking at the nutch sources to see how to modify it such that nutch will crawl only links with certain keywords but to no avail. Can anyone please help me out. Even just pointing me to the right classes to start with would be of great help. I was told there is a nutch book that I can buy. Would this book be of help? Would you recommend it for my purpose? Thanks Jason __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
