Can you provide the stats output that scrapy puts by default to stdout when the crawl is finished? That should tell us if you're seeing 301, 302s, etc.
Pastebin is generally your friend instead of a big block of text that email will screw up. On Tue, Feb 24, 2015 at 1:49 PM, <[email protected]> wrote: > Hello Travis, > > It would be great if you could give me some pointer on how I can try to > solve this problem. > > Everytime Scrapy started to crawl a link I made it to put into a Log. So I > confirmed that he crawled about 197 links from a list of about 30.000. It > should not have stoped i think. > > There are some invalid links in the list, but I see from the output > windows that it gives me an URL error (HTTP status code is not handled or > not allow), buts its ok because it continues on to the next link. When I > try to access the link manually it also appears an error, "Non existing > product". > > I also see some links that redirect but inside the same domain and only > once. When you access the link manualy it does the some, it redirects you > to another similar link. > > In case the server has a anti scrapper blocking how can I counter it? It > there any way to slow the requests to download the pages? > > Thank you for your help. > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
