nutch infinite deph crawl

Cam Bazz Wed, 06 Jul 2011 05:10:31 -0700

Hello,

I am running nutch with bin/nutch crawl urls -dir crawl -depth 3 -topN 3


and in my urls/sites file, I have two sites like:

http://www.mysite.com
http://www.mysite2.com

I would like to crawl those two sites to infinite depth, and just
index all the pages in these sites. But I dont want it to go to remote
sites, like facebook if there is a link from those sites.

How do I do it? I know this is a primitive question, but I have looked
all the documentation but could not figure it out.

Best Regards,
C.B.

nutch infinite deph crawl

Reply via email to