Hi all,

I started experimenting with Nutch using the NutchTutorial. I got a
succesful crawl to work using the command 'bin/nutch crawl urls -dir
crawl' (no limitations on depth or number of documents). I noticed
that Nutch finishes quite fast. When I looked in the source-html of
the main page being crawled I noticed that Nutch never followed links
that look like these:

<a href="content.jsp?objectid=22619">Route</a>
<br/>
<a href="content.jsp?objectid=5931">Openingstijden</a>
<br/>

Surely these links look ordinary enough to be seen and followed by
nutch? Could someone please tell me what could be causing these links
not be followed?

Thanks for any help,

Jeroen

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to