[issue29533] urllib2 works slowly with proxy on windows

2017-04-20 Thread Marc Schlaich

Marc Schlaich added the comment:

Well, you can read the proxy settings from registry and write them to 
os.environ (no_proxy needs to be transformed as it has a different format).

This will only take effect for the current process.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-04-20 Thread Julia Dolgova

Julia Dolgova added the comment:

I'm not sure that users of my program will like if I define such variables in 
their systems

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-04-20 Thread Marc Schlaich

Marc Schlaich added the comment:

BTW, you can workaround this issue by defining the `http_proxy` and `no_proxy` 
environment variables.

In this case urllib isn't doing any DNS request.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-04-19 Thread Julia Dolgova

Julia Dolgova added the comment:

I compared the behaviour of urllib with these browsers: Firefox("use system 
proxy" selected), Google Chrome, Yandex. And also Skype (requests to 
login.live.com). All of them are not doing DNS requests for proxy bypass 
handling as Marc expects.
The result is attached: compare_urllib_progs.png

--
Added file: http://bugs.python.org/file46813/compare_urllib_progs.png

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-04-18 Thread Marc Schlaich

Marc Schlaich added the comment:

Julia, could you please add other major browsers/HTTP clients (Firefox, Chrome, 
curl, ...) to your comparison (compare_ie_urllib.txt). I would expect that 
Python/urllib is the only implementation doing DNS requests for proxy bypass 
handling.

Please note that curl uses the `no_proxy` environment variable, so the syntax 
is slightly different.

For anyone who doesn't fully grasp the details of this issue, there might be a 
better explanation at https://github.com/kennethreitz/requests/issues/2988.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-04-06 Thread Marc Schlaich

Marc Schlaich added the comment:

This could be even a security issue.

People might rely on a proxy as a privacy feature. In this case the proxy 
should do forward/reverse DNS requests and not the client. Doing DNS lookups to 
check for proxy bypass doesn't seem right. I don't think that major browsers 
are doing this, at least Firefox is not 
(https://bugzilla.mozilla.org/show_bug.cgi?id=136789).

--
nosy: +schlamar

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-04-06 Thread Julia Dolgova

Julia Dolgova added the comment:

Steve, do you mean that there should be no address to IE configuration from 
urllib? I could undertake it if I understand the task.

gethostbyaddr() is ok. It just makes a reverse lookup, that some dns-servers 
work up too slow. The command "nslookup" also works slowly in same conditions. 
The problem is in those dns-servers I think.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-03-30 Thread Steve Dower

Steve Dower added the comment:

I think the point is that we don't want to be grabbing settings like this from 
other configuration locations. Ideally, there'd be a way to provide a list of 
"don't bypass the proxy for these names", which a caller could then read from 
the IE configuration if they want.

The other part of the problem is it seems that nobody on this thread (apart 
from perhaps you) understands exactly what's going on here :) You may want to 
post to python-dev and see if anyone who understands the intricacies of how 
gethostbyaddr() should/does work is willing to chime in.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-03-30 Thread Julia Dolgova

Julia Dolgova added the comment:

Ok, but may be there are some Windows users, that have different opinion, who 
prefer to put up with this bug for the benefit of better performance. Could you 
leave them an opportunity to refuse this behavior of urllib?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-03-30 Thread Paul Moore

Paul Moore added the comment:

The behaviour you're describing for IE sounds like a bug to me. If you specify 
a host that should bypass the proxy, then that's what should happen - it 
shouldn't matter if you specify the host by IP address or by name.

I'm -1 on Python trying to match IE bugs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-03-29 Thread Julia Dolgova

Julia Dolgova added the comment:

May be I described the problem not clearly enough, because English is not my 
native language, so I try to explain once again.

In Windows there is an option "Do not use proxy server for address beginning 
with". I call this option . This option is invented for 
Internet Explorer (IE) and is used by IE. It could be used by other 
applications and I think it's obvious that other applications must handle it 
same way as IE does. May be I'm wrong here, please dissuade me.

The problem is that IE only compares the hostname received with items in this 
list. And urllib also makes the reverse lookup and the forward lookup of the 
hostname and compares the results of those lookups with items in this list. To 
reproduce that you need to:
1. Run a proxy on your Windows or use any other proxy, that outputs requests 
coming from clients.
2. On Windows in "Browser settings" (IE settings) turn on the option "use 
proxy", set up the IP of your proxy, set the list "Do not use proxy server for 
address beginning with" to '23.253.135.79' (without commas). (23.253.135.79 - 
is the result of 'nslookup python.org' at this time when I write this comment)
3. Make a request in IE to http://python.org/. Then analyze the output of your 
proxy. You will see that the request to python.org goes through proxy. 
4. Make a request to http://python.org/ via urllib (run checklib-py3.py). 
Analyze the output of your proxy. You will see that the request to python.org 
bypasses proxy.

Be careful: there might be redirections when you make a request to 
http://python.org/. If you see 'http://www.python.org/' in proxy output and 
don't see 'http://python.org/' it means that request to 'python.org' bypasses 
proxy.

This is the behavioral part of the problem which is attended by the performance 
decreasing, because the reverse lookup on some dns servers for some hostnames 
works slowly (up to 10 secs sometimes). May be the solution in my PR is not 
smart enough. But how can I make this issue go forward?

--
Added file: http://bugs.python.org/file46766/checklib-py3.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-03-25 Thread Julia Dolgova

Julia Dolgova added the comment:

Could someone look into my PR, please...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-22 Thread Julia Dolgova

Changes by Julia Dolgova :


--
pull_requests: +212

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-22 Thread Julia Dolgova

Julia Dolgova added the comment:

I added variable smart_proxy_bypass into request module. If it's False then 
checking IP and FQDN of the host is skipped in proxy_bypass_registry function. 

I set the default value to False, because I think it's better when the 
behaviour of urllib corresponds to IE rather than previous versions of urllib. 
This will affect only NT-systems.

--
Added file: http://bugs.python.org/file46661/request.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-19 Thread Julia Dolgova

Julia Dolgova added the comment:

http://bugs.python.org/issue23384 - same problem

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-19 Thread Julia Dolgova

Julia Dolgova added the comment:

Why not to take it into account? 

Imagine that someone wants that requests to "ovinnik.canonical.com" should 
bypass proxy and requests to "ubuntu.com" souldn't. I don't know what for, it's 
just an assumption. 
He adds a hostname "ovinnik.canonical.com" into  and checks 
requests in IE. He sees that requests to "ovinnik.canonical.com" bypass proxy 
and requests to "ubuntu.com" go via proxy. And it's ok.
But suddenly he discovers that requests in urllib to "ubuntu.com" bypass proxy 
and it's unexpected. 

I think this behavior of urllib should be at least optional.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-19 Thread Steve Dower

Steve Dower added the comment:

My guess is that IE is implemented using lower level APIs and it can choose 
whether to bypass based on its own list. There's no reason for any other 
software to take its settings into account.

That said, it would be great if urllib can avoid adding long delays, at least 
more than once. I'm personally not sure how best to do that though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-18 Thread Julia Dolgova

Julia Dolgova added the comment:

I compared the behavior of IE and urllib.
I put different addresses to the  ("Do not use proxy server 
for address beginning with" setting), made different requests through IE and 
urllib and watched if the proxy was bypassed.

IE doesn't even make a forward dns lookup for the hosname given to check 
whether it should bypass proxy, whereas urllib does.

For example:
proxy_bypass_list   request  IE bypasses proxy   
urllib bypasses proxy
23.253.135.79   https://python.org/  no  yes
151.101.76.223  https://docs.python.org/ no  yes
ovinnik.canonical.com   https://ubuntu.com/  no  yes

compare_ie_urllib.txt - full report

--
Added file: http://bugs.python.org/file46649/compare_ie_urllib.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-18 Thread Julia Dolgova

Julia Dolgova added the comment:

The issue applies to 3.6 as well. 

I agree that the replacement of gethostbyaddr with gethostbyname_ex is not a 
solution. But is there a way to check whether a hostname is in the  that doesn't bring to the reverse lookup? I suppose that IE 
doesn't make a reverse lookup for each request.

--
versions: +Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-16 Thread Julia Dolgova

Julia Dolgova added the comment:

I sincerely appreciate your time. Thank you very much for your answer. I'll try 
to test this on python 3.5

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-16 Thread Eryk Sun

Eryk Sun added the comment:

gethostbyname_ex won't do a reverse lookup on an IP to get the fully-qualified 
domain name, which seems pointless for a function named getfqdn. I think 
calling gethostbyaddr is intentional here and goes back to the Python 1.x days.

Also, FYI, socket_gethostbyaddr in socketmodule.c doesn't pass a name to  C 
gethostbyaddr. It first calls setipaddr, which calls getaddrinfo to get the IP 
address. 

For "docs.python.org", the reverse lookup on the IP address has no data. Well, 
in Windows the error code is WSANO_DATA; in Linux I get HOST_NOT_FOUND.

--
nosy: +eryksun

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-16 Thread Steve Dower

Steve Dower added the comment:

There's a few reasons why you haven't heard a reply. First among them is that 
we're all volunteers with limited free time, and second is that we just 
migrated to github and all that free time is being consumed right now.

Python 2.7 is only receiving security fixes at this point. We might apply a fix 
for this in 3.5 and later, but you haven't indicated whether it applies to 
those and (I assume) nobody has tested it yet.

Your initial report is very good and much appreciated, we've just been busy and 
this doesn't jump out as urgent.

--
nosy: +orsenthil

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-15 Thread Julia Dolgova

Julia Dolgova added the comment:

Surely noone is concerned that programms written on python could work better 
when addressing to "python.org"?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-11 Thread Julia Dolgova

Changes by Julia Dolgova :


--
keywords: +patch
Added file: http://bugs.python.org/file46632/socket.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-11 Thread Julia Dolgova

Changes by Julia Dolgova :


Added file: http://bugs.python.org/file46630/test.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-11 Thread Julia Dolgova

Changes by Julia Dolgova :


Added file: http://bugs.python.org/file46631/log.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-11 Thread Julia Dolgova

Changes by Julia Dolgova :


Added file: http://bugs.python.org/file46629/socket.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29533] urllib2 works slowly with proxy on windows

2017-02-11 Thread Julia Dolgova

New submission from Julia Dolgova:

I've found that urllib works sometimes slowly on windows with proxy.

To reproduce the issue:
on Windows:
1. Turn on the option "use proxy" in "browser settings" in "control panel".
No real proxy needed. The problem will come out before addressing to proxy. 
Just don't pay attention to exception.
2. Make sure that the list of addresses for proxy bypass is not empty
3. Execute checklib.py with socket.py (attached here) in the same directory

The result output could be:
A (not a problem):
Before call to _socket.gethostbyaddr("docs.python.org")
After call to _socket.gethostbyaddr("docs.python.org")

B (little problem):
Before call to _socket.gethostbyaddr("docs.python.org")
Exception in call to _socket.gethostbyaddr("docs.python.org")

C (worse problem):
Before call to _socket.gethostbyaddr("docs.python.org")
(Delay)
Exception in call to _socket.gethostbyaddr("docs.python.org")

The result A,B or C depends on what DNS server you use, what url you pass into 
urllib2.urlopen(), and could differ at different time because dns is not a 
stable thing. 
However, no matter what result you have, this test shows that a hostname is 
passed into gethostbyaddr instead of IP as expected and described in MSDN. It 
should be changed to gethostbyname_ex here.

test.py compare performance of gethostbyaddr and gethostbyname_ex. 
It sets different dns servers on the system and calls these functions with 
different hostnames passed into. Run on my computer shows that gethostbyname_ex 
is 3 times more productive and doesn't raise exceptions.

-
Attached files:
checklib.py - just make a call to urllib2.urlopen("https://docs.python.org;)
socket.py - not a patched lib. Has debug lines near 141 line. Use it with 
checklib.py.
test.py - compare performance of gethostbyaddr with gethostbyname_ex
log.txt - result of test.py on my computer (Windows 8, 64 bit)
socket.patch - socket.py patch

--
components: Library (Lib), Windows
files: checklib.py
messages: 287597
nosy: juliadolgova, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: urllib2 works slowly with proxy on windows
type: performance
versions: Python 2.7
Added file: http://bugs.python.org/file46628/checklib.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com