Hi!
I think you should use urlfilter-regex like
"http://\w\.xyz\.com/stuff.*" instead of urlfilter-domain and set
db.ignore.external.links to false, this will work, but this is quite
slow if you have many regex.
You may also try to add xyz.com to domain-suffixes.xml, this may cause
some side effects, i had never tested this, just looked in
DomainURLFilter source, so it's probably not really good idea.
Sergey Volkov
On Mon 07 Nov 2011 12:35:30 AM MSK, Peyman Mohajerian wrote:
Hi Guys,
Let's say my input file is:
http://www.xyz.com/stuff
and I have thousands of these URLs in my input. How do I configure
Nutch to also crawl this subdomain for each input:
http://abc.xyz.com/stuff
I don't want to just replace 'www' with 'abc' i want to crawl both.
Thanks
Peyman