I've implemented nutch as a site search to try it out.

When I crawl my own site with nutch, I end up with a strange set of links:

downloads/}).21()}),cr:(g(t){t.8.22().1f({2Y:(t.1i[0]-t.8.4u)+
downloads/+aa[6u].ib().ia()+
downloads/).30(/\\s+$/,

The list is huge, but it's a lot of the same.

I suspect that the links are coming from MediaWiki, but up until now I
haven't seen any such links in my error logs.  It also makes the crawl take
much longer than is really necessary.

I'm running the tutorial crawl.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to