Hello all, It has been 4 months of development, 30 authors, more than 80 issues closed, 225 commits, 177 files changed, 6740 insertions and 4134 deletions. New and old faces has been seen in the past months reporting and fixing issues, discussing and helping get new features in shape. Pretty amazing work, thanks to everyone that contributed in one or other way to make Scrapy 0.24 possible!
I'd like to take this opportunity to ask for help with the scrapy.org website. Its design is old (hasn't changed much since 2008!) and we would like to give it a proper makeover, with a fresher, modern look, maybe including a snippet of simple, self-contained code that shows the power of Scrapy. Anyone out there that would like to become famous for designing the new scrapy.org website? :) Check out the Release Notes <http://doc.scrapy.org/en/latest/news.html#id1>, from where I would like to highlight the now simpler top-level imports and selector's shortcuts:: import scrapy class MySpider(scrapy.Spider): # ... def parse(self, response): for href in response.xpath('//a/@href').extract(): yield scrapy.Request(url) At last but not less important, the credits: A.J. Welch (1): Generalize the file pipeline log messages so they are not specific to downloading images. Alex Cepoi (2): improvements to scrapy check/contracts fix contracts tests Alexander Chekunkov (5): test for RFPDupeFilter.request_fingerprint overriding added note about RFPDupeFilter.request_fingerprint overriding to the settings documentation added short RFPDupeFilter.request_fingerprint interface description DOWNLOADER setting DOWNLOADER setting Alexey Bezhan (6): Clarify MapCompose documentation Fix some typos, whitespace and small errors in docs Add a note about reporting security issues Bind telnet console and webservice to 127.0.0.1 by default Fix PEP8 warnings in project template files Fix PEP8 warnings in spider templates Ana Sabina Uban (1): Fixed SgmlLinkExtractor constructor to properly handle both string and list parameters (attrs, tags, deny_extensions) Benoit Blanchon (3): BaseSgmlLinkExtractor: Fixed unknown_endtag() so that it only set current_link=None when the end tag match the opening tag BaseSgmlLinkExtractor: Added unit test of a link with an inner tag BaseSgmlLinkExtractor: Fixed the missing space when the link has an inner tag Breno Colom (1): Update scrapy command line doc with additional scrapy parse options Cameron Lane (2): [#744] Ensure domain is not None before building regex [#744] Test for allowed domains including NoneTypes Capi Etheriel (4): fixes dynamic itemclass example usage of type() Running lucasdemarchi/codespell to fix typos in docs Running lucasdemarchi/codespell to fix typos in SEPs Running lucasdemarchi/codespell to fix typos in code Carlos Rivera (1): grammatical issue Cash Costello (1): Added missing word in practices.rst Claudio Salazar (4): Fixed XXE flaw in sitemap reader Fixed XML selector against XXE attacks Added test against XXE attacks for Sitemap Added resolve_entities to kwargs in SafeXMLParser Daniel Graña (45): Merge 0.22.0 release notes bump version to 0.23 fix 0.22.0 release date Update Ubuntu installation instructions fix apt-get line replace warning about updating package lists by a note on package upgrade show ubuntu setup instructions as literal code replace unencodeable codepoints with html entities. fixes #562 and #285 Fix wrong checks on subclassing of deprecated classes. closes #581 test inspect.stack failure localhost666 can resolve under certain circumstances Add 0.22.1 release notes fix a reference to unexistent engine.slots. closes #593 Add 0.22.2 release notes try to restore pypy tests Run testsuite with py.test cleanup toplevel namespace Add basic top-level shortcuts remove .re() shortcut update docs update spider templates Remove "sel" shortcut from scrapy shell$ document shortcuts in TextResponse class Ammend example nesting selectors Restore and deprecate "sel" shortcut limit Twisted support to pre-14.0.0 while #718 is fixed fix tests after changes introduced by scrapy/w3lib#21 force installation of w3lib and queuelib for trunk env Avoid IPython warning. thanks @bryant1410. closes #623 sort spiders in "scrapy list" cmd. closes #736 Add a LevelDB cache backend add leveldb cache backend docs indent parsed-literal as part of ordered list Upload sdist and wheel packages to pypi using travis-ci deploys Add bumpversion config Revert "limit Twisted support to pre-14.0.0 while #718 is fixed" hold a reference to backwards compatible _contextFactory Restore compatibility with Settings.overrides while still deprecating it recognize jl extension as jsonlines exporter and update docs promote LxmlLinkExtractor as default in docs address latest comments No need to keep extracted links as instance attribute. fixes #763 Add 0.24.0 release notes Bump version: 0.23.0 → 0.24.0 set 0.24.0 release date Denys Butenko (5): Resolved issue #546. Output format parsing from filename extension. Added back `-t` option. If `--output-format` not defined parse from extension `--output` Fix default value. Add import os for crawl. Added more verbose error message for unrecognized output format. PEP8. Edwin O Marshall (32): Converted sep-001 to rst format converted sep 002 to rst - decided that removing files would cause conflicts on merge - readded file to prevent future merge conflicts converted sep 3 for #629 sep 4 for #629 sep 11 for #629 - sep 15 for #629 sep 6 for #629 - sep 10 for #629 - didn't like the way blockquotes rendered - trying to separate quote context - changing indentation so contexts are recognized - given that it'sa block quote, quotation marks seem redundant - removing trac file again to see if merges play well together - removed trac file - removed trac file - removed trac file - removed track file removed trac file removed trac file - removed trac file converted sep 7 for #629 sep 12 for #629 - converted sep 18 converted sep 16 converted sep 13 converted sep 5 - convertd sep 8 converted sep 9 converted sep017 sep 14 for #629 Irhine (2): add encoding utf-8 to the first line support i18n by using utf-8 coding template files Julia Medina (34): New doc: clickdata in Formrequest.from_response New tests: clickdata's nr in Formrequest.from_response FormRequest doc improvements More appropriate assert in FormRequest test Tests for loading download handlers Fix minor typo in DownloaderHandlers comment Doc for disabling download handler Minor fixes in LoadTestCase in test_downloader_handlers Trial functionality for running tests with pytest Add py33 environment to allowed failures in travis-ci Support doctest and __init__.py test discover in pytest Ignore files with import errors on pytest test discover Change function name so it does not mess up with pytest autodiscover Fix httpcache doctest that assumed dictionary order Ensure spiders module reload between spider manager tests New tox env: docs Ignore known broken links in docs linkcheck Fix broken links in documentation sep#19 proposed changes New SettingsAttribute class Settings priorities dictionary New set and setdict method using SettingsAttribute in Settings Deprecate CrawlerSettings, as its functionality is replicable by Settings class Settings and SettingsAtribute tests Fix and extend the documentation of the new Settings api Settings topic updated Fix settings repr on the logs of the shell and tutorial docs topics setmodule helper method on Settings class Update get_crawler method in utils/test.py with new Settings interface get_project_settings now returns a Settings instance Change command's default_settings population in cmdline.py Change how settings are overriden in ScrapyCommands Fix settings usage in runspider and crawl commands Fix settings usage across tests Mikhail Korobov (18): fix typos in news.rst and remove (not released yet) header Handle cases when inspect.stack() fails testing PIL dependency is removed because there is a new mitmproxy version TST Improved twisted installation in tox.ini for Python 3.3 reduce code duplication in test_spidermiddleware_httperror scrapy.utils.test.docrawl function Fix for #612 + integration-style tests for HttpErrorMiddleware TST fix file descriptor leak and a bad variable name in get_testlog make scrapy.version_info a tuple of integers remove unused import use "import scrapy" in templates DOC use top-level shortcuts in docs suggest scrapy.Selector in deprecation warnings TST fix tests that became broken after adding top-level imports and switching to py.test. fix scrapy.version_info when SCRAPY_VERSION_FROM_GIT is set response.selector, response.xpath(), response.css() and response.re() DOC selectors.rst cleanup add utf8 encoding header to spider templates Nikita Nikishin (1): Fixed #441. Nikolaos-Digenis Karagiannis (5): downloaderMW doc typo (spiderMW doc copy remnant) SpiderMW doc typo: SWP request, response ItemLoader doc: missing args in replace_value() document spider.closed() shortcut Document signal "request_scheduled" Pablo Hoffman (11): make 'basic' the default template spider in genspider, and added info with next steps to startproject. closes #488 add SEP-021 (Add-ons) - work in progress remove references to deprecated scrapy-developers list rename attribute to match conventions used for XXX_DEBUG settings (in autothrottle and cookies mw) remove no longer used setting: MAIL_DEBUG remove unused setting: DOWNLOADER_DEBUG signals doc: make argument order more consistent with code (although it doesn't matter in practice) add Julia to SEP-019 authors crate release notes for 0.24 and #699 to it minor change to request_scheduled signal doc doc: use |version| substitution in ubuntu packages Paul Brown (1): fixed typo Paul Tremberth (18): Disable smart strings in lxml XPath evaluations Make lxml smart strings functionality customizable Add testcase to check is default Selector doesnt return smart strings Use assertTrue/False RegexLinkExtractor: encode URL unicode value when creating Links Offsite: add 2 stats counters Always enable offsite stats + refactor test to initialize crawler Fix tests for Travis-CI build CrawSpider: support process_links as generator Fix HtmlParserLinkExtractor and tests after #485 merge Docs: 4-space indent for final spider example DupeFilter: add setting for verbose logging + stats counter for filtered requests Remove _log_level attribute as per comments Support case-insensitive domains in url_is_from_any_domain() Add tests for start requests, filtered and non-filtered Check pending start_requests before calling _spider_idle() in engine (fixes #706) Add LxmlLinkExtractor class similar to SgmlLinkExtractor (#528) Add doc on LxmlLinkExtractor class Rafal Jagoda (1): add response arg to item_dropped signal handlers #710 Rendaw (1): Elaborated request priority value. Rolando Espinoza (8): Ignore None's values when using the ItemLoader. Unused re import and PEP8 minor edits. Expose current crawler in the scrapy shell. PEP8 minor edits. Updated shell docs with the crawler reference and fixed the actual shell output. Updated the tutorial crawl output with latest output. DOC Fixed HTTPCACHE_STORAGE typo in the default value which is now Filesystem instead Dbm. DOC Use pipelines module name instead of pipieline following default project files. Rolando Espinoza La fuente (1): Alow to disable a downloader handler just like any other component. Ruben Vereecken (2): Added content-type check as per issue #193 Redefined test for #193 deed02392 (1): Update httperror.py ncp1113 (1): for loops have to have a : at the end of the line nyov (2): better call to parent class update a link reference stray-leone (1): modify the version of scrapy ubuntu package tpeng (3): add message when raise IngoreReques; fix item_scraped document set the exit code to non-zero when contracts fails print spider name even it has no contract tests when -v is specified tracicot (1): Correct typos -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.