In the docs, it says: TextResponse objects adds encoding capabilities to the base Response class, > which is meant to be used > only for binary data, such as images, sounds or any media file.
I understood this to mean that the base Response class is meant to be used only for binary data. However, I also read: TextResponse Objects are used for binary data such as images, sounds etc > which has the ability to encode the base Response class. https://www.tutorialspoint.com/scrapy/scrapy_requests_and_responses.htm which is of course exactly the opposite of how I interpreted it. Would someone here please clarify? Thanks. As additional background, I am scraping text, not photos or media files. So it makes sense to me that something called TextResponse would be intended for use with text, but I didn't write it, so I don't know. That's why I am asking for clarification. Ordinarily, when I download, it is a bytes object which I then have to convert to unicode. If I can set it up to come to me as unicode in the first place, that would save me a step and be great. But that leads me to my second question: How exactly are we supposed to implement TextResponse? I am in 100% agreement with the OP here: https://groups.google.com/forum/#!msg/scrapy-users/-ulA_0Is1Kc/oZzM2kuTmd4J;context-place=forum/scrapy-users and I don't think he (or I) got a sufficient answer. HTML pages are the most common response types spiders deal with, and their > class is HtmlResponse, which inherits from TextResponse, so you can use all > its features. Well, if that's so, then TextResponse would be the default and we'd get back unicode strings, right? But that's not what happens. We get byte strings. And despite the answer found there, it is not at all clear how we can use these response subclasses if we are told the middleware does it all automatically, as if we aren't supposed to worry about it. If that were so, why tell us about, or even have - the subclass at all? Here's an error I got: TypeError: TextResponse url must be str, got list: The list the error is referring to is my start_urls variable that I've been using without issue until I tried to use TextResponse. So if we can't use a list, are we supposed to only feed it one url at a time? Manually? Your patient, thorough, and detailed explanation of these issues is greatly appreciated. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.