[issue21475] Support the Sitemap extension in robotparser

2015-10-15 Thread Peter Wirtz
Peter Wirtz added the comment: Here is a patch that provides support for the Sitemap extension. -- keywords: +patch Added file: http://bugs.python.org/file40791/robotparser_site_maps_v1.patch ___ Python tracker <rep...@bugs.python.org>

[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-14 Thread Peter Wirtz
Peter Wirtz added the comment: On further inspection of the tests, it appears that the way in which the tests are written, a test case can only be tested for one useragent at a time. I will attempt to work on the tests so work correctly. Any advice would be much appreciated

[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-14 Thread Peter Wirtz
Peter Wirtz added the comment: Ok, for the mean time, I reworked the test so it appears to test correctly and tests passes. There does seem to be some magic, so I do hope I did not overlook anything. Here is the new patch. -- Added file: http://bugs.python.org/file40784

[issue21475] Support the Sitemap extension in robotparser

2015-10-14 Thread Peter Wirtz
Peter Wirtz added the comment: I would like to tackle this issue. Should I wait for issue25400 to be resolved first? -- nosy: +pwirtz ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/i

[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-13 Thread Peter Wirtz
New submission from Peter Wirtz: After changeset http://hg.python.org/lookup/dbed7cacfb7e, calling the crawl_delay method for a robots.txt files that has a crawl-delay for * useragents always returns None. Ex: Python 3.6.0a0 (default:1aae9b6a6929+, Oct 9 2015, 22:08:05) [GCC 4.2.1

[issue25400] robotparser doesn't return crawl delay for default entry

2015-10-13 Thread Peter Wirtz
Peter Wirtz added the comment: This fix breaks the unit tests though. I am not sure how to go about checking those as this would be my first contribution to python and an open source project in general. -- ___ Python tracker <rep...@bugs.python.