Re: 301 redirects, RFPDupeFilter and url parameters order

Travis Leleu Tue, 16 Dec 2014 08:34:08 -0800

Scrapy should still crawl the URL 1 time, though.  Do you think scrapy
isn't crawling b=2&a=1 even once?  Can you provide some evidence (output,
debug messages, preferably through pastebin as formatting in email is hard
to read) to support this?


On Mon, Dec 15, 2014 at 2:57 PM, crawler <[email protected]> wrote:
>
> I have a site that performs 301 redirect from example.com?a=1&b=2 to
> example.com?b=2&a=1
> So, I got trouble:
> — Scrapy found url example.com?b=2&a=1 and put in a queue;
> — then Scrapy changed URL to example.com?a=1&b=2 and sent request;
> — site performed redirect to example.com?b=2&a=1;
> — Scrapy got URL example.com?b=2&a=1 and filtered this URL out as a
> duplicate.
>
> What should I do? I can't disable RFPDupeFilter, because there are real
> duplicate links.
> I can't change that site's behavior.
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: 301 redirects, RFPDupeFilter and url parameters order

Reply via email to