TextReponse

Malik Rumi Mon, 22 May 2017 19:55:46 -0700

In the docs, it says:

TextResponse objects adds encoding capabilities to the base Response class, 
> which is meant to be used
> only for binary data, such as images, sounds or any media file.

I understood this to mean that the base Response class is meant to be used
only for binary data. However, I also read:

TextResponse Objects are used for binary data such as images, sounds etc
> which has the ability to encode the base Response class.

https://www.tutorialspoint.com/scrapy/scrapy_requests_and_responses.htm

which is of course exactly the opposite of how I interpreted it. Would
someone here please clarify? Thanks.

As additional background, I am scraping text, not photos or media files. So
it makes sense to me that something called TextResponse would be
intended for use with text, but I didn't write it, so I don't know. That's
why I am asking for clarification.

Ordinarily, when I download, it is a bytes object which I then have to
convert to unicode. If I can set it up to come to me as unicode in the
first place,
that would save me a step and be great. But that leads me to my second
question: How exactly are we supposed to implement TextResponse?

I am in 100% agreement with the OP
here:
https://groups.google.com/forum/#!msg/scrapy-users/-ulA_0Is1Kc/oZzM2kuTmd4J;context-place=forum/scrapy-users

and I don't think he (or I) got a sufficient answer.

HTML pages are the most common response types spiders deal with, and their
> class is HtmlResponse, which inherits from TextResponse, so you can use all
> its features.

Well, if that's so, then TextResponse would be the default and we'd get
back unicode strings, right? But that's not what happens. We get byte
strings.

And despite the answer found there, it is not at all clear how we can use
these response subclasses if we are told the middleware does it all
automatically, as if we aren't
supposed to worry about it. If that were so, why tell us about, or even
have - the subclass at all?

Here's an error I got: TypeError: TextResponse url must be str, got list:
The list the error is referring to is my start_urls variable that I've been
using without issue until I tried to use TextResponse. So if we can't use a
list, are we supposed to only feed it
one url at a time? Manually?

Your patient, thorough, and detailed explanation of these issues is greatly
appreciated.

--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

TextReponse

Reply via email to