[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis
Martin v. Löwis added the comment: I have updated my patch per the review. -- Added file: http://bugs.python.org/file36267/skip_idna.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22127

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis
Martin v. Löwis added the comment: Serhiy: your patch still changes the type of exception, for s.sendto(b'hello',(u'thisisaverylongstringthisisaverylongstringthisisaverylongstringthisisaverylongstring', 4242)) You get a UnicodeError now, but a socket.gaierror then. This is because the name

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Serhiy: your patch still changes the type of exception, for Oh, really. I'm fine with either being applied. Antoine? May be apply your Argument Clinic friendly patch to 3.5 and simple patch to earlier versions? --

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: Martin's approach looks better to me; also, it could be exported for other modules (for example, the ssl module also requests idna encoding at one place). I don't know if this should be fixed in 3.4. It's a performance improvement, not really a bug fix.

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Roundup Robot
Roundup Robot added the comment: New changeset bc991d4f9ce7 by Martin v. Löwis in branch 'default': Issue #22127: Bypass IDNA for pure-ASCII host names (in particular for numeric IPs). http://hg.python.org/cpython/rev/bc991d4f9ce7 New changeset 0b477934e0a1 by Martin v. Löwis in branch

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Roundup Robot
Roundup Robot added the comment: New changeset 49085b746029 by Martin v. Löwis in branch 'default': Issue #22127: fix typo. http://hg.python.org/cpython/rev/49085b746029 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22127

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis
Changes by Martin v. Löwis mar...@v.loewis.de: -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22127 ___

[issue22127] performance regression in socket getsockaddrarg()

2014-08-05 Thread Martin v . Löwis
Martin v. Löwis added the comment: I agree that this doesn't need to be back ported to 3.4, in particular as there is a minor semantic change (for invalid labels, it might perform a DNS lookup, instead of rejecting them right away). -- ___ Python

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread STINNER Victor
STINNER Victor added the comment: Abc is a bytes string in Python 2 and an Unicode string in Python 3. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22127 ___

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali
Charles-François Natali added the comment: Note that even the bytes version is still quite slow. UDP is used for light-weight protocols where you may send thousands or more messages per second. I'd be curious what the sendto() performance is in raw C. Ah, I wouldn't rely on the absolyte

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Perhaps it is time to add support of ipaddress objects in socket functions. Then we could avoid address parsing in tight loop not only for Unicode strings, but for bytes strings too. s = socket.socket(...) addr =

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Antoine Pitrou
Antoine Pitrou added the comment: Perhaps it is time to add support of ipaddress objects in socket functions. What I was thinking too :-) However, beware the parsing cost of ipaddress objects themselves. One common pattern when doing UDP networking is the following: def

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali
Charles-François Natali added the comment: Parsing a bytes object i.e. b'127.0.0.1' is done by inet_pton(), so it's probably cheap (compared to a syscall). If we had getaddrinfo() and gethostbyname() return bytes instead of strings, it would be a huge gain. --

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Martin v . Löwis
Martin v. Löwis added the comment: Charles-François: you get the idna overhead in 2.7, too, by specifying u'127.0.0.1' as the address. The idna overhead could be bypassed fairly easily in C by: 1. checking that the string is an ASCII string (this is possible in constant time, in 3.x) 2.

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Martin v . Löwis
Martin v. Löwis added the comment: The attached patch makes the difference between Unicode and bytes strings for host names negligible, plus it slightly speeds up the bytes case as well. -- keywords: +patch Added file: http://bugs.python.org/file36253/skip_idna.diff

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali
Charles-François Natali added the comment: Charles-François: you get the idna overhead in 2.7, too, by specifying u'127.0.0.1' as the address. I don't see it in a profile output, and the timing doesn't change whether I pass '127.0.0.1' or b'127.0.0.1' in 2.7. --

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Martin v . Löwis
Martin v. Löwis added the comment: Please understand that Victor and I were asking you to pass a *unicode* object, with a *u* prefix. For me, the time more-than-doubles, on OSX, with the system python. mvl:~ loewis$ /usr/bin/python -m timeit -s import socket; s =

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Charles-François Natali
Charles-François Natali added the comment: Please understand that Victor and I were asking you to pass a *unicode* object, with a *u* prefix. For me, the time more-than-doubles, on OSX, with the system python. Sorry, I misread 'b'. it's a day without... --

[issue22127] performance regression in socket getsockaddrarg()

2014-08-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: 2. directly passing the ASCII string to setipaddr (leaving any error detection to this routine) This will change the type of exception. If this is acceptable and modulo Antoine's and my nitpicks on Rietveld, the patch LGTM. But it is too complicated.

[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Charles-François Natali
Changes by Charles-François Natali cf.nat...@gmail.com: -- title: performance regression in socket.getsockaddr() - performance regression in socket getsockaddrarg() ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22127

[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: IDNA encoding is quite slow (see 6e1071ed4c66). I'm surprised we accept general hosnames in sendto(), though (rather than plain IP addresses). 25 µs per call is a lot for such a function. -- ___ Python tracker

[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread STINNER Victor
STINNER Victor added the comment: For Python, the encoder is only used when you pass a Unicode string. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue22127 ___

[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Charles-François Natali
Charles-François Natali added the comment: For Python, the encoder is only used when you pass a Unicode string. Hm... I'm passing ('127.0.0.1', 4242)as destination, and you can see in the above profile that the idna encode function is called. This doesn't occur with 2.7. --

[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Charles-François Natali
Charles-François Natali added the comment: OK, I think I see what you mean: $ ./python -m timeit -s import socket; s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.sendto(b'hello', ('127.0.0.1', 4242))1 loops, best of 3: 44.7 usec per loop $ ./python -m timeit -s import socket; s =

[issue22127] performance regression in socket getsockaddrarg()

2014-08-03 Thread Antoine Pitrou
Antoine Pitrou added the comment: Note that even the bytes version is still quite slow. UDP is used for light-weight protocols where you may send thousands or more messages per second. I'd be curious what the sendto() performance is in raw C. --