In the docs, it says:

TextResponse objects adds encoding capabilities to the base Response class, 
> which is meant to be used
> only for binary data, such as images, sounds or any media file.


I understood this to mean that the base Response class is meant to be used 
only for binary data. However, I also read:

TextResponse Objects are used for binary data such as images, sounds etc 
> which has the ability to encode the base Response class. 


 https://www.tutorialspoint.com/scrapy/scrapy_requests_and_responses.htm

which is of course exactly the opposite of how I interpreted it. Would 
someone here please clarify? Thanks.

As additional background, I am scraping text, not photos or media files. So 
it makes sense to me that something called TextResponse would be 
intended for use with text, but I didn't write it, so I don't know. That's 
why I am asking for clarification.

Ordinarily, when I download, it is a bytes object which I then have to 
convert to unicode. If I can set it up to come to me as unicode in the 
first place, 
that would save me a step and be great. But that leads me to my second 
question: How exactly are we supposed to implement TextResponse? 

I am in 100% agreement with the OP 
here: 
https://groups.google.com/forum/#!msg/scrapy-users/-ulA_0Is1Kc/oZzM2kuTmd4J;context-place=forum/scrapy-users

and I don't think he (or I) got a sufficient answer.

HTML pages are the most common response types spiders deal with, and their 
> class is HtmlResponse, which inherits from TextResponse, so you can use all 
> its features.


Well, if that's so, then TextResponse would be the default and we'd get 
back unicode strings, right? But that's not what happens. We get byte 
strings.

And despite the answer found there, it is not at all clear how we can use 
these response subclasses if we are told the middleware does it all 
automatically, as if we aren't
supposed to worry about it. If that were so, why tell us about, or even 
have - the subclass at all?

Here's an error I got: TypeError: TextResponse url must be str, got list:
The list the error is referring to is my start_urls variable that I've been 
using without issue until I tried to use TextResponse. So if we can't use a 
list, are we supposed to only feed it
one url at a time? Manually? 

Your patient, thorough, and detailed explanation of these issues is greatly 
appreciated. 

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to