[issue31661] Issues with request rate in robotparser

2017-10-02 Thread Nikolay Bogoychev

Nikolay Bogoychev <nhe...@gmail.com> added the comment:

Hey Serhiy,

The use of namedtuple was requested specifically at a review, I didn't 
implement it like this initially: https://bugs.python.org/review/16099/#ps6205

I wasn't aware of the performance implications. Could you please explain to me 
the type vs instance in terms of performance (or point me to a resource, a 
quick googling didn't yield anything? How was I supposed to have coded it 
properly?

Cheers,

Nick

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue31661>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2015-10-07 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Hey,

Friendly reminder that there has been no activity on this issue for more than 
an year.

Cheers,

Nick

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16099>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2014-08-26 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Hey,

Just a friendly reminder that the patch is pending for review and there has 
been no activity for 3 months (:

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2014-07-15 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Hey,

Just a friendly reminder that there has been no activity for a month and a half 
and v3 is pending for review (:

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2014-05-27 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Updated patch, all comments addressed, sorry for the 6 months delay. Please 
review

--
Added file: http://bugs.python.org/file35377/robotparser_v3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2014-01-21 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Hey,

Just a reminder friendly reminder that there hasn't been any activity for a 
month and I have released a v2, pending for review (:

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2013-12-09 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Thank you for the review!
I have addressed your comments and release a v2 of the patch:
Highlights:
 No longer crashes when provided with malformed crawl-delay/robots.txt 
parameter.
 Returns None when parameter is missing or syntax is invalid.
 Simplified several functions.
 Extended tests.

http://bugs.python.org/review/16099/diff/6206/Doc/library/urllib.robotparser.rst
File Doc/library/urllib.robotparser.rst (right):

http://bugs.python.org/review/16099/diff/6206/Doc/library/urllib.robotparser
Doc/library/urllib.robotparser.rst:56: .. method:: crawl_delay(useragent)
On 2013/12/09 03:30:54, berkerpeksag wrote:
 Is crawl_delay used for search engines? Google recommends you to set crawl 
 speed
 via Google Webmaster Tools instead.
 
 See https://support.google.com/webmasters/answer/48620?hl=en.
 
Crawl delay and request rate parameters are targeted to custom crawlers that 
many people/companies write for specific tasks. The Google webmaster tools is 
targeted only to google's crawler and typically web admins have different rates 
for google/yahoo/bing and all other user agents.

http://bugs.python.org/review/16099/diff/6206/Lib/urllib/robotparser.py
File Lib/urllib/robotparser.py (right):

http://bugs.python.org/review/16099/diff/6206/Lib/urllib/robotparser.py#newco...
Lib/urllib/robotparser.py:168: for entry in self.entries:
On 2013/12/09 03:30:54, berkerpeksag wrote:
 Is there a better way to calculate this? (perhaps O(1)?)

I have followed the model of what was written beforehand. A 0(1) implementation 
(probably based on dictionaries) would require a complete rewrite of this 
library, as all previously implemented functions employ the:
for entry in self.entries:
if entry.applies_to(useragent):

logic. I don't think this matters particularly here, as those two functions in 
particular need only be called once per domain and robots.txt seldom contains 
more than 3 entries. This is why I have just followed the design laid out by 
the original developer.

Thanks

Nick

--
Added file: http://bugs.python.org/file33071/robotparser_v2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2013-12-09 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Oh... Sorry for the spam, could you please verify my documentation link syntax. 
I'm not entirely sure I got it right.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2013-12-08 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Hey,
it has been more than an year since the last activity. 
Is there anything else I should do in order for someone of the python devs team 
to review my changes and perhaps give some feedback?

Nick

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2012-10-07 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Okay, here's a proper patch with documentation entry and test cases.
Please review and comment

--
Added file: http://bugs.python.org/file27476/robotparser.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2012-10-07 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Reformatted patch

--
Added file: http://bugs.python.org/file27477/robotparser_reformatted.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2012-10-01 Thread Nikolay Bogoychev

New submission from Nikolay Bogoychev:

Robotparser doesn't support two quite important optional parameters from the 
robots.txt file. I have implemented those in the following way:
(Robotparser should be initialized in the usual way:
rp = robotparser.RobotFileParser()
rp.set_url(..)
rp.read
)

crawl_delay(useragent) - Returns time in seconds that you need to wait for 
crawling
if none is specified, or doesn't apply to this user agent, returns -1
request_rate(useragent) - Returns a list in the form [request,seconds].
if none is specified, or doesn't apply to this user agent, returns -1

--
components: Library (Lib)
files: robotparser.patch
keywords: patch
messages: 171711
nosy: XapaJIaMnu
priority: normal
severity: normal
status: open
title: robotparser doesn't support request rate and crawl delay parameters
type: enhancement
versions: Python 2.7
Added file: http://bugs.python.org/file27373/robotparser.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16099] robotparser doesn't support request rate and crawl delay parameters

2012-10-01 Thread Nikolay Bogoychev

Nikolay Bogoychev added the comment:

Okay, sorry didn't know that (:
Here's the same patch (Same functionality) for python3

Feedback is welcome, as always (:

--
Added file: http://bugs.python.org/file27374/robotparser.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16099
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com