Hi Jun, Twisted is the scrapy component that makes the http requests. It also provides the asynchronous capabilities, which is a big part of why scrapy is so scalable.
The error you received is because the remote server abruptly terminated the connection with your computer (Twisted). Depending on the frequency, it could be anti-bot logic (if it happens consistently then seems to work the next day or on another IP). It could just be unreliable hosts -- http connections can terminate for any reason due to the complexity involved everywhere along the line. As far as what to do about it, I would recommend you simply ignore it, unless getting every single image is worth a lot of effort. In that case, I would look into trying to catch these errors in code somewhere, and logging which requests were rejected. Once your crawler is done running (and the "rejected / error / no-response" requests queue is thus populated), you can pop through the queue and re-request the files. (I'd do this with a different spider that consumed from the queue directly, but that is just me.) On Mon, Dec 15, 2014 at 10:15 PM, Jun Liu <[email protected]> wrote: > ping? Anyone can help please? > > > Thanks, > Jun > > On Sat, Dec 13, 2014 at 6:01 PM, Jun Liu <[email protected]> wrote: >> >> Hi Scrapy experts, >> >> I have a spider trying to scrape product data from >> http://www.katespade.com/. It has an image pipeline similar to the one >> in scrapy tutorial: >> >> class MyImagesPipeline(ImagesPipeline): >> >> ... >> >> I pretty much copy/paste it from the tutorial. However, when I run my >> spider, I occasionally got unknown error of downloading images. The error >> is something like below: >> >> 2014-12-13 17:08:01-0800 [spider_ks] WARNING: File (unknown-error): Error >> downloading image from <GET >> http://a248.e.akamai.net/f/248/9086/10h/origin-d4.scene7.com/is/image/KateSpade/NJMU4368_473?wid=750&fmt=jpg> >> referred in <None>: [<twisted.python.failure.Failure <class >> 'twisted.internet.error.ConnectionLost'>>, <twisted.python.failure.Failure >> <class 'twisted.web.http._DataLoss'>>] >> >> My question is: What is this error? How to solve it? >> >> >> Thanks, >> >> Jun >> >> >> -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
