Once I had such an issue, I checked it via an HTTP sniffer.

So I suggest you to check HTTP headers of these transfers, from start to
end. And please share with us; I am out of tools at the moment.

In my case, this was the case
1. load Page A
2. Page A redirects to Page B
3. Page B sets a cookie and redirects back to Page A
4. unfetched, because URL is already on the list.

Check whether this is the same situation. If so, change your page control
settings, so that same URL does not mean already crawled. (in
nutch-default.xml values like db.signature.class, db.fetch.schedule.class)

Best,
Dincer


2011/8/18 <[email protected]>

> As far as I understood redirected urls are scored 0 and that is why fetcher
> does not pick them up in the earlier depths. They may be crawled starting
> depth 4  depending on the size of the seed list.
>
>
>
>
>
> -----Original Message-----
> From: abhayd <[email protected]>
> To: nutch-user <[email protected]>
> Sent: Wed, Aug 17, 2011 4:41 pm
> Subject: Re: nutch redirect treatment
>
>
> thanks for response.
>
> But my issue is after redirect new url is not being crawled. Not a scoring
> issue.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3263311.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>
>

Reply via email to