Re: Response object for a 404 http error

Paul Tremberth Wed, 16 Apr 2014 14:32:24 -0700

Hi Hakim,

I'm not sure how you get this "instance" with attributes related to errors. 
and you catching these through an errback?


You can get non-200 responses via HttpError middleware (enabled by default) 
and by defining an handle_httpstatus_list attribute to your spider

Example:

from scrapy.spider import Spider

class ErrorSpider(Spider):
    name = "testerror"
    allowed_domains = ["dmoz.org"]
    start_urls = [
        "http://www.dmoz.org/";,
        "http://www.dmoz.org/rererere/";,
    ]
    handle_httpstatus_list = [404]

    def parse(self, response):
        self.log("type: %s; status %d" % (type(response), response.status))



On Tuesday, April 15, 2014 4:51:23 PM UTC+2, Hakim Benoudjit wrote:
>
> hi guys,
>
> I have a little issue with reponse object inside a request callback when 
> the page returns a 404:
>     - If the page exists (http code:* 200*) response is of type 
> *HtmlResponse*.
>     - If the page returns 404, response is of type *instance *which 
> contain some attriubtes related to error messages, and in this latter case, 
> *status 
> *isnt an attriburte of the *response *object.
>
> so I can know if the response *status *is *404*, only if I verify *response 
> *object class (*HtmlResponse or **instance *).
>
> how do we know that a page returns *404 *if *response.status *isnt 
> available as an attribute of *reponse *object ?
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Response object for a 404 http error

Reply via email to