Hi all, I started experimenting with Nutch using the NutchTutorial. I got a succesful crawl to work using the command 'bin/nutch crawl urls -dir crawl' (no limitations on depth or number of documents). I noticed that Nutch finishes quite fast. When I looked in the source-html of the main page being crawled I noticed that Nutch never followed links that look like these:
<a href="content.jsp?objectid=22619">Route</a> <br/> <a href="content.jsp?objectid=5931">Openingstijden</a> <br/> Surely these links look ordinary enough to be seen and followed by nutch? Could someone please tell me what could be causing these links not be followed? Thanks for any help, Jeroen ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
