Hi all,

I wonder how could I find the original url after it hits a redirection.
They're actually found on seedlist but I can not guarantee which url is
redirected to which url.  In Fetcher phase I expect to read it from
Nutch.WRITABLE_REPR_URL_KEY, but it is overriden by redirected url.

Any suggestion how to read them from crawldb, segments or linkdb?

PS: I only crawl first-level pages (depth:1) on seedlist.

Best,
Tugcem.

-- 
TO

Reply via email to