Nutch 1.6 find original url or redirected ones

Tuğcem Oral Tue, 11 Nov 2014 05:19:06 -0800

Hi all,

I wonder how could I find the original url after it hits a redirection.
They're actually found on seedlist but I can not guarantee which url is
redirected to which url.  In Fetcher phase I expect to read it from
Nutch.WRITABLE_REPR_URL_KEY, but it is overriden by redirected url.


Any suggestion how to read them from crawldb, segments or linkdb?

PS: I only crawl first-level pages (depth:1) on seedlist.

Best,
Tugcem.

-- 
TO

Nutch 1.6 find original url or redirected ones

Reply via email to