I've implemented nutch as a site search to try it out.
When I crawl my own site with nutch, I end up with a strange set of links:
downloads/}).21()}),cr:(g(t){t.8.22().1f({2Y:(t.1i[0]-t.8.4u)+
downloads/+aa[6u].ib().ia()+
downloads/).30(/\\s+$/,
The list is huge, but it's a lot of the same.
I suspect that the links are coming from MediaWiki, but up until now I
haven't seen any such links in my error logs. It also makes the crawl take
much longer than is really necessary.
I'm running the tutorial crawl.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general