Hello Scrapy users,

we released Scrapy 1.4.0 last Thursday and we hope you will like it.

It brings a bunch of bug fixes but also a handful of new features.

response.follow: the new kid in town

Checkout the new response.follow 
<https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Response.follow>
 
shortcut method to properly build Request objects in your callbacks.

It is the new recommended way to do that. It’s shorter to write, and more 
correct.

So, instead of:


    for href in response.css('li.page a::attr(href)').extract():
        url = response.urljoin(href)
        yield scrapy.Request(url, self.parse, encoding=response.encoding)


you can now write this:


    for a in response.css('li.page a'):
        yield response.follow(a, self.parse)


FTP in Python 3

Scrapy finally supports FTP in Python 3, with the additional support for 
anonymous FTP sessions even.

Just make sure you are using at least Twisted 17.1.

Link extractors

Link extractors also got some love regarding leading and trailing 
whitespace.

Their behavior is now much closer to what your regular desktop browser does 
when following hyperlinks.

Oh, and we disabled the default canonicalization of URLs for extracted 
links.

It was causing more trouble for users than anything.

Referrer policy

Handling of the “Referer” HTTP header is now driven by a customizable 
Referrer Policy, as defined by the W3C 
<https://www.w3.org/TR/referrer-policy/>.

Checkout the details and security implications in the dedicated docs section 
<https://docs.scrapy.org/en/latest/topics/spider-middleware.html#std:setting-REFERRER_POLICY>
.

Pretty-printing your items

Scrapy 1.4 also has a new option for pretty-printing items when you export 
to JSON or XML.

By default, you still have items on their own line. But you can also get a 
more human-readable output with a non-negative FEED_EXPORT_INDENT 
<https://docs.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_INDENT>
.

To get a pretty-printed JSON with an indentation of two spaces, you run:

$ scrapy crawl yourspider -o items.json -s FEED_EXPORT_INDENT=2


We recommend all users to update Scrapy to version 1.4.0.

Pip users:

$ pip install --upgrade scrapy


Conda users:

$ conda install -c conda-forge scrapy=1.4.0


Check out the release notes 
<https://docs.scrapy.org/en/latest/news.html#scrapy-1-4-0-2017-05-18> for 
the full changelog.

Happy scraping!

/Paul, for the Scrapy team

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to