Hi,

I have a list of about 5000 URLs which I need to crawl and fetch using
Nutch. I want to do a very deep crawl on each and I want subdomains, but I
dont want external links. If I set db.ignore.external.links, I dont get the
subdomains. So I cant use that. If I set the domain in regex-urlfilter, I
can avoid the external links and get the subdomains, but it does not seem
right to include so many urls in the filter. Am I missing some configuration
or am I using Nutch wrongly?

I would appreciate any help. Thanks in advance.

Thanks and Regards,
Sonal

Reply via email to