[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I have updated my patch per the review.

--
Added file: http://bugs.python.org/file36267/skip_idna.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Serhiy: your patch still changes the type of exception, for 

s.sendto(b'hello',(u'thisisaverylongstringthisisaverylongstringthisisaverylongstringthisisaverylongstring',
 4242))

You get a UnicodeError now, but a socket.gaierror then. This is because the 
name encodes fine as ascii, but still violates the IDNA requirement on label 
length. My patch does the same. I don't see where your and my patch differ in 
behavior.

But I agree that your patch is certainly much simpler, while mine might be 
slightly faster (for not creating copies of the host name).

I'm fine with either being applied. Antoine?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Serhiy: your patch still changes the type of exception, for

Oh, really.

 I'm fine with either being applied. Antoine?

May be apply your Argument Clinic friendly patch to 3.5 and simple patch to 
earlier versions?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Martin's approach looks better to me; also, it could be exported for other 
modules (for example, the ssl module also requests idna encoding at one place).

I don't know if this should be fixed in 3.4. It's a performance improvement, 
not really a bug fix.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Roundup Robot

Roundup Robot added the comment:

New changeset bc991d4f9ce7 by Martin v. Löwis in branch 'default':
Issue #22127: Bypass IDNA for pure-ASCII host names (in particular for numeric 
IPs).
http://hg.python.org/cpython/rev/bc991d4f9ce7

New changeset 0b477934e0a1 by Martin v. Löwis in branch 'default':
Issue #22127: Bypass IDNA for pure-ASCII host names (in particular for numeric 
IPs).
http://hg.python.org/cpython/rev/0b477934e0a1

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 49085b746029 by Martin v. Löwis in branch 'default':
Issue #22127: fix typo.
http://hg.python.org/cpython/rev/49085b746029

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis

Changes by Martin v. Löwis mar...@v.loewis.de:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis

Martin v. Löwis added the comment:

I agree that this doesn't need to be back ported to 3.4, in particular as there 
is a minor semantic change (for invalid labels, it might perform a DNS lookup, 
instead of rejecting them right away).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread STINNER Victor

STINNER Victor added the comment:

Abc is a bytes string in Python 2 and an Unicode string in Python 3.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali

Charles-François Natali added the comment:

 Note that even the bytes version is still quite slow. UDP is used for 
 light-weight protocols where you may send thousands or more messages per 
 second. I'd be curious what the sendto() performance is in raw C.

Ah, I wouldn't rely on the absolyte values, my computer is *slow*.

On a more recent machine, I get this:
10 loops, best of 3: 8.82 usec per loop

Whereas a C loop gives a 4usec per loop.

 Abc is a bytes string in Python 2 and an Unicode string in Python 3.

Sure, but why do getaddrinfo() and gethostbyname() return strings then?

This means that someone using:

addr = getaddrinfo(...)
sendto(DATA, addr)

Will pay the idna encoding upon every call to sendto().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Perhaps it is time to add support of ipaddress objects in socket functions. 
Then we could avoid address parsing in tight loop not only for Unicode strings, 
but for bytes strings too.

s = socket.socket(...)
addr = ipaddress.ip_address(ipaddress.getaddrinfo(...))
for ...:
s.sendto(DATA, (addr, port))

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Perhaps it is time to add support of ipaddress objects in socket functions.

What I was thinking too :-)
However, beware the parsing cost of ipaddress objects themselves.

One common pattern when doing UDP networking is the following:

  def datagram_received(self, remote_addr, data):
  # process data
  ...
  self.send_to(remote_addr, response_data)

If you want to pass an ipaddress object to send_to, you have to make it so that 
datagram_received() gives you an ipaddress object too.

Perhaps we need a more low-level solution, e.g. a parsing cache integrated in 
the C socket module.

--
nosy: +gvanrossum, ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali

Charles-François Natali added the comment:

Parsing a bytes object i.e. b'127.0.0.1' is done by inet_pton(), so
it's probably cheap (compared to a syscall).

If we had getaddrinfo() and gethostbyname() return bytes instead of
strings, it would be a huge gain.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Charles-François: you get the idna overhead in 2.7, too, by specifying 
u'127.0.0.1' as the address.

The idna overhead could be bypassed fairly easily in C by:
1. checking that the string is an ASCII string (this is possible in constant 
time, in 3.x)
2. directly passing the ASCII string to setipaddr (leaving any error detection 
to this routine)

Before adding caching, I'd check whether a cache lookup is actually faster than 
calling inet_pton.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Martin v . Löwis

Martin v. Löwis added the comment:

The attached patch makes the difference between Unicode and bytes strings for 
host names negligible, plus it slightly speeds up the bytes case as well.

--
keywords: +patch
Added file: http://bugs.python.org/file36253/skip_idna.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali

Charles-François Natali added the comment:

 Charles-François: you get the idna overhead in 2.7, too, by specifying 
 u'127.0.0.1' as the address.

I don't see it in a profile output, and the timing doesn't change
whether I pass '127.0.0.1' or b'127.0.0.1' in 2.7.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Martin v . Löwis

Martin v. Löwis added the comment:

Please understand that Victor and I were asking you to pass a *unicode* object, 
with a *u* prefix. For me, the time more-than-doubles, on OSX, with the system 
python.

mvl:~ loewis$ /usr/bin/python -m timeit -s import socket; s = 
socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.sendto(b'hello', 
('127.0.0.1', 4242))
10 loops, best of 3: 8.15 usec per loop
mvl:~ loewis$ /usr/bin/python -m timeit -s import socket; s = 
socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.sendto(b'hello', 
(u'127.0.0.1', 4242))
1 loops, best of 3: 19.5 usec per loop
mvl:~ loewis$ /usr/bin/python -V
Python 2.7.5

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali

Charles-François Natali added the comment:

 Please understand that Victor and I were asking you to pass a *unicode* 
 object, with a *u* prefix. For me, the time more-than-doubles, on OSX, with 
 the system python.

Sorry, I misread 'b'.
it's a day without...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 2. directly passing the ASCII string to setipaddr (leaving any error 
 detection to this routine)

This will change the type of exception. If this is acceptable and modulo 
Antoine's and my nitpicks on Rietveld, the patch LGTM.

But it is too complicated. Here is alternative. It has many flaws (less 
extensible, incompatible with Argument Clinic, can produce inaccurate error 
message, etc), but it is much simpler. And preserve the type of exception.

--
Added file: http://bugs.python.org/file36254/skip_idna_alt.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Charles-François Natali

Changes by Charles-François Natali cf.nat...@gmail.com:


--
title: performance regression in socket.getsockaddr() - performance regression 
in socket getsockaddrarg()

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

IDNA encoding is quite slow (see 6e1071ed4c66). I'm surprised we accept general 
hosnames in sendto(), though (rather than plain IP addresses). 25 µs per call 
is a lot for such a function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread STINNER Victor

STINNER Victor added the comment:

For Python, the encoder is only used when you pass a Unicode string.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Charles-François Natali

Charles-François Natali added the comment:

 For Python, the encoder is only used when you pass a Unicode string.

Hm...
I'm passing ('127.0.0.1', 4242)as destination, and you can see in the
above profile that the idna encode function is called.
This doesn't occur with 2.7.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Charles-François Natali

Charles-François Natali added the comment:

OK, I think I see what you mean:

$ ./python -m timeit -s import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.sendto(b'hello',
('127.0.0.1', 4242))1 loops, best of 3: 44.7 usec per loop
$ ./python -m timeit -s import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.sendto(b'hello',
(b'127.0.0.1', 4242))
1 loops, best of 3: 23.7 usec per loop

That's really surprising, especially since gethostbyname() and
getaddrinfo() seem to return strings:
$ ./python -m timeit -s import socket; s =
socket.socket(socket.AF_INET, socket.SOCK_DGRAM);
addr=socket.gethostbyname('127.0.0.1') s.sendto(b'hello', (addr,
4242))

$ ./python -c import socket; print(type(socket.gethostbyname('127.0.0.1')))
class 'str'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Note that even the bytes version is still quite slow. UDP is used for 
light-weight protocols where you may send thousands or more messages per 
second. I'd be curious what the sendto() performance is in raw C.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22127
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com