Hi Jun,

Twisted is the scrapy component that makes the http requests.  It also
provides the asynchronous capabilities, which is a big part of why scrapy
is so scalable.

The error you received is because the remote server abruptly terminated the
connection with your computer (Twisted).  Depending on the frequency, it
could be anti-bot logic (if it happens consistently then seems to work the
next day or on another IP).  It could just be unreliable hosts -- http
connections can terminate for any reason due to the complexity involved
everywhere along the line.

As far as what to do about it, I would recommend you simply ignore it,
unless getting every single image is worth a lot of effort.  In that case,
I would look into trying to catch these errors in code somewhere, and
logging which requests were rejected.  Once your crawler is done running
(and the "rejected / error / no-response" requests queue is thus
populated), you can pop through the queue and re-request the files.  (I'd
do this with a different spider that consumed from the queue directly, but
that is just me.)


On Mon, Dec 15, 2014 at 10:15 PM, Jun Liu <[email protected]> wrote:

> ping? Anyone can help please?
>
>
> Thanks,
> Jun
>
> On Sat, Dec 13, 2014 at 6:01 PM, Jun Liu <[email protected]> wrote:
>>
>> Hi Scrapy experts,
>>
>> I have a spider trying to scrape product data from
>> http://www.katespade.com/. It has an image pipeline similar to the one
>> in scrapy tutorial:
>>
>> class MyImagesPipeline(ImagesPipeline):
>>
>> ...
>>
>> I pretty much copy/paste it from the tutorial. However, when I run my
>> spider, I occasionally got unknown error of downloading images. The error
>> is something like below:
>>
>> 2014-12-13 17:08:01-0800 [spider_ks] WARNING: File (unknown-error): Error
>> downloading image from <GET
>> http://a248.e.akamai.net/f/248/9086/10h/origin-d4.scene7.com/is/image/KateSpade/NJMU4368_473?wid=750&fmt=jpg>
>> referred in <None>: [<twisted.python.failure.Failure <class
>> 'twisted.internet.error.ConnectionLost'>>, <twisted.python.failure.Failure
>> <class 'twisted.web.http._DataLoss'>>]
>>
>> My question is: What is this error? How to solve it?
>>
>>
>> Thanks,
>>
>> Jun
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to