[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: I'm not sure that users of my program will like if I define such variables in their systems -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: I compared the behaviour of urllib with these browsers: Firefox("use system proxy" selected), Google Chrome, Yandex. And also Skype (requests to login.live.com). All of them are not doing DNS requests for proxy bypass handling as Marc expects. The result is attached: compare_urllib_progs.png -- Added file: http://bugs.python.org/file46813/compare_urllib_progs.png ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: Steve, do you mean that there should be no address to IE configuration from urllib? I could undertake it if I understand the task. gethostbyaddr() is ok. It just makes a reverse lookup, that some dns-servers work up too slow. The command "nslookup" also works slowly in same conditions. The problem is in those dns-servers I think. -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: Ok, but may be there are some Windows users, that have different opinion, who prefer to put up with this bug for the benefit of better performance. Could you leave them an opportunity to refuse this behavior of urllib? -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: May be I described the problem not clearly enough, because English is not my native language, so I try to explain once again. In Windows there is an option "Do not use proxy server for address beginning with". I call this option . This option is invented for Internet Explorer (IE) and is used by IE. It could be used by other applications and I think it's obvious that other applications must handle it same way as IE does. May be I'm wrong here, please dissuade me. The problem is that IE only compares the hostname received with items in this list. And urllib also makes the reverse lookup and the forward lookup of the hostname and compares the results of those lookups with items in this list. To reproduce that you need to: 1. Run a proxy on your Windows or use any other proxy, that outputs requests coming from clients. 2. On Windows in "Browser settings" (IE settings) turn on the option "use proxy", set up the IP of your proxy, set the list "Do not use proxy server for address beginning with" to '23.253.135.79' (without commas). (23.253.135.79 - is the result of 'nslookup python.org' at this time when I write this comment) 3. Make a request in IE to http://python.org/. Then analyze the output of your proxy. You will see that the request to python.org goes through proxy. 4. Make a request to http://python.org/ via urllib (run checklib-py3.py). Analyze the output of your proxy. You will see that the request to python.org bypasses proxy. Be careful: there might be redirections when you make a request to http://python.org/. If you see 'http://www.python.org/' in proxy output and don't see 'http://python.org/' it means that request to 'python.org' bypasses proxy. This is the behavioral part of the problem which is attended by the performance decreasing, because the reverse lookup on some dns servers for some hostnames works slowly (up to 10 secs sometimes). May be the solution in my PR is not smart enough. But how can I make this issue go forward? -- Added file: http://bugs.python.org/file46766/checklib-py3.py ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: Could someone look into my PR, please... -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Changes by Julia Dolgova : -- pull_requests: +212 ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: I added variable smart_proxy_bypass into request module. If it's False then checking IP and FQDN of the host is skipped in proxy_bypass_registry function. I set the default value to False, because I think it's better when the behaviour of urllib corresponds to IE rather than previous versions of urllib. This will affect only NT-systems. -- Added file: http://bugs.python.org/file46661/request.patch ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: http://bugs.python.org/issue23384 - same problem -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: Why not to take it into account? Imagine that someone wants that requests to "ovinnik.canonical.com" should bypass proxy and requests to "ubuntu.com" souldn't. I don't know what for, it's just an assumption. He adds a hostname "ovinnik.canonical.com" into and checks requests in IE. He sees that requests to "ovinnik.canonical.com" bypass proxy and requests to "ubuntu.com" go via proxy. And it's ok. But suddenly he discovers that requests in urllib to "ubuntu.com" bypass proxy and it's unexpected. I think this behavior of urllib should be at least optional. -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: I compared the behavior of IE and urllib. I put different addresses to the ("Do not use proxy server for address beginning with" setting), made different requests through IE and urllib and watched if the proxy was bypassed. IE doesn't even make a forward dns lookup for the hosname given to check whether it should bypass proxy, whereas urllib does. For example: proxy_bypass_list request IE bypasses proxy urllib bypasses proxy 23.253.135.79 https://python.org/ no yes 151.101.76.223 https://docs.python.org/ no yes ovinnik.canonical.com https://ubuntu.com/ no yes compare_ie_urllib.txt - full report -- Added file: http://bugs.python.org/file46649/compare_ie_urllib.txt ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23384] urllib.proxy_bypass_registry slow down under Windows if website has no reverse DNS and Fiddler is runing
Changes by Julia Dolgova : -- nosy: +juliadolgova ___ Python tracker <http://bugs.python.org/issue23384> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: The issue applies to 3.6 as well. I agree that the replacement of gethostbyaddr with gethostbyname_ex is not a solution. But is there a way to check whether a hostname is in the that doesn't bring to the reverse lookup? I suppose that IE doesn't make a reverse lookup for each request. -- versions: +Python 3.6 ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: I sincerely appreciate your time. Thank you very much for your answer. I'll try to test this on python 3.5 -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Julia Dolgova added the comment: Surely noone is concerned that programms written on python could work better when addressing to "python.org"? -- ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Changes by Julia Dolgova : -- keywords: +patch Added file: http://bugs.python.org/file46632/socket.patch ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Changes by Julia Dolgova : Added file: http://bugs.python.org/file46630/test.py ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Changes by Julia Dolgova : Added file: http://bugs.python.org/file46631/log.txt ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
Changes by Julia Dolgova : Added file: http://bugs.python.org/file46629/socket.py ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29533] urllib2 works slowly with proxy on windows
New submission from Julia Dolgova: I've found that urllib works sometimes slowly on windows with proxy. To reproduce the issue: on Windows: 1. Turn on the option "use proxy" in "browser settings" in "control panel". No real proxy needed. The problem will come out before addressing to proxy. Just don't pay attention to exception. 2. Make sure that the list of addresses for proxy bypass is not empty 3. Execute checklib.py with socket.py (attached here) in the same directory The result output could be: A (not a problem): Before call to _socket.gethostbyaddr("docs.python.org") After call to _socket.gethostbyaddr("docs.python.org") B (little problem): Before call to _socket.gethostbyaddr("docs.python.org") Exception in call to _socket.gethostbyaddr("docs.python.org") C (worse problem): Before call to _socket.gethostbyaddr("docs.python.org") (Delay) Exception in call to _socket.gethostbyaddr("docs.python.org") The result A,B or C depends on what DNS server you use, what url you pass into urllib2.urlopen(), and could differ at different time because dns is not a stable thing. However, no matter what result you have, this test shows that a hostname is passed into gethostbyaddr instead of IP as expected and described in MSDN. It should be changed to gethostbyname_ex here. test.py compare performance of gethostbyaddr and gethostbyname_ex. It sets different dns servers on the system and calls these functions with different hostnames passed into. Run on my computer shows that gethostbyname_ex is 3 times more productive and doesn't raise exceptions. - Attached files: checklib.py - just make a call to urllib2.urlopen("https://docs.python.org";) socket.py - not a patched lib. Has debug lines near 141 line. Use it with checklib.py. test.py - compare performance of gethostbyaddr with gethostbyname_ex log.txt - result of test.py on my computer (Windows 8, 64 bit) socket.patch - socket.py patch -- components: Library (Lib), Windows files: checklib.py messages: 287597 nosy: juliadolgova, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: urllib2 works slowly with proxy on windows type: performance versions: Python 2.7 Added file: http://bugs.python.org/file46628/checklib.py ___ Python tracker <http://bugs.python.org/issue29533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26236] urllib2 initiate irregular call to gethostbyaddr
New submission from Julia Dolgova: I'm using python 2.7. My system is windows 7(64-bit). I also use proxy. urllib2.urlopen usually implements 0,2..1sec but sometimes sends a strange UDP to 137 port (netbios-ns) of the remote server, waits 4..6 sec. and then sends HTTP-request. If I disable Netbios over TCP/IP in my system settings no UDP to 137 port is sent, but urlopen still implements 4..6sec. I've found out that the delay happens in _socket.gethostbyaddr(HostName) called by socket.getfqdn(HostName) called by urllib.proxy_bypass_registry(HostName) called by urllib.proxy_bypass(HostName) called by urllib2.ProxyHandler.proxy_open HostName='pykasso.rc-online.ru' "nslookup pykasso.rc-online.ru" works quickly in my computer I suppose the problem is that the hostname is passed into gethostbyaddr instead of IP If I add an IP-verification of the string before socket.getfqdn() call in urllib.proxy_bypass_registry() try: socket.inet_aton(rawHost) I added this operator fqdn = socket.getfqdn(rawHost) if fqdn != rawHost: host.append(fqdn) except socket.error: pass then no delay happens. My proposal is to make an IP-verification in urllib.proxy_bypass_registry() or to add an opportunity for a programmer to refuse a proxy bypass attempt -- components: Library (Lib), Windows messages: 259190 nosy: juliadolgova, paul.moore, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: urllib2 initiate irregular call to gethostbyaddr type: performance versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue26236> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com