Hi - You can use the domain url filter to manually whitelist domains.
-----Original message-----
> From:Alexandre <[email protected]>
> Sent: Mon 24-Sep-2012 09:19
> To: [email protected]
> Subject: External domain redirection with db.ignore.external.links=true
>
> Hi,
>
> I've a question concerning redirection to external domain.
> I crawl different websites, but I don't want to crawl external links. For
> that I used the option
> db.ignore.external.links=true
> It's working fine. But my problem is, that the websites using redirection to
> an external domain are not crawled.
> For exemple:
> http://www.ikea.at is redirected to http://www.ikea.com/at/de/ and my
> crawler ignore this website because of the option
> db.ignore.external.links=true.
>
> A solution could be to use directly the url http://www.ikea.com/at/de/ in
> the seed list, but this is not an option for me, because I can not change
> this list.
>
> Is there any possibility in Nutch to authorize to crawl websites that are
> redirected to external domains, and ignore external links?
>
> Thank for your help,
>
> Alex.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/External-domain-redirection-with-db-ignore-external-links-true-tp4009783.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>