Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Victor Stinner
I added a _PyUnicodeWriter internal API to optimize str%args and
str.format(args). It uses a buffer which is overallocated, so it's
basically like CPython str += str optimization. I still don't know how
efficient it is on Windows, since realloc() is slow on Windows (at
least on old Windows versions).

We should add an official and public API to concatenate strings. I
know that PyPy has already its own API. Example:

writer = UnicodeWriter()
for item in data:
writer += item   # i guess that it's faster than writer.append(item)
return str(writer) # or writer.getvalue() ?

I don't care of the exact implementation of UnicodeWriter, it just
have to be as fast or faster than ''.join(data).

I don't remember if _PyUnicodeWriter is faster than StringIO or
slower. I created an issue for that:
http://bugs.python.org/issue15612

Victor

2013/2/12 Maciej Fijalkowski fij...@gmail.com:
 Hi

 We recently encountered a performance issue in stdlib for pypy. It
 turned out that someone commited a performance fix that uses += for
 strings instead of .join() that was there before.

 Now this hurts pypy (we can mitigate it to some degree though) and
 possible Jython and IronPython too.

 How people feel about generally not having += on long strings in
 stdlib (since the refcount = 1 thing is a hack)?

 What about other performance improvements in stdlib that are
 problematic for pypy or others?

 Personally I would like cleaner code in stdlib vs speeding up CPython.
 Typically that also helps pypy so I'm not unbiased.

 Cheers,
 fijal
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 10:02 AM, Victor Stinner
victor.stin...@gmail.com wrote:
 I added a _PyUnicodeWriter internal API to optimize str%args and
 str.format(args). It uses a buffer which is overallocated, so it's
 basically like CPython str += str optimization. I still don't know how
 efficient it is on Windows, since realloc() is slow on Windows (at
 least on old Windows versions).

 We should add an official and public API to concatenate strings. I
 know that PyPy has already its own API. Example:

 writer = UnicodeWriter()
 for item in data:
 writer += item   # i guess that it's faster than writer.append(item)
 return str(writer) # or writer.getvalue() ?

 I don't care of the exact implementation of UnicodeWriter, it just
 have to be as fast or faster than ''.join(data).

 I don't remember if _PyUnicodeWriter is faster than StringIO or
 slower. I created an issue for that:
 http://bugs.python.org/issue15612

 Victor

it's in __pypy__.builders (StringBuilder and UnicodeBuilder). The API
does not really matter, as long as there is a way to preallocate
certain size (which I don't think there is in StringIO for example).
bytearray comes close but has a relatively inconvinient API and any
pure-python bytearray wrapper will not be fast on CPython.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Tue, Feb 12, 2013 at 10:03 PM, Maciej Fijalkowski fij...@gmail.com wrote:
 Hi

 We recently encountered a performance issue in stdlib for pypy. It
 turned out that someone commited a performance fix that uses += for
 strings instead of .join() that was there before.

Can someone show the actual diff? Of this?

I'm making a talk about outdated patterns in Python at DjangoCon EU,
prompted by this question, and obsessive avoidance of string
concatenation. But all the tests I've done show that ''.join() still
is faster or as fast, except when you are joining very few strings,
like for example two strings, in which case concatenation is faster or
as fast. Both under PyPy and CPython. So I'd like to know in which
case ''.hoin() is faster on PyPy and += faster on CPython.

Code with times

x = 10
s1 = 'X'* x
s2 = 'X'* x

for i in xrange(500):
 s1 += s2

Python 3.3: 0.049 seconds
PyPy 1.9: 24.217 seconds

PyPy indeed is much much slower than CPython here.
But let's look at the join case:

x = 10
s1 = 'X'* x
s2 = 'X'* x

for i in xrange(500):
 s1 = ''.join((s1, s2))

Python 3.3: 18.969 seconds
PyPy 1.9: 62.539 seconds

Here PyPy needs twice the time, and CPython needs 387 times as long
time. Both are slower.

The best case is of course to make a long list of strings and join them:

x = 10
s1 = 'X'* x
s2 = 'X'* x

l = [s1]
for i in xrange(500):
 l.append(s2)

s1 = ''.join(l)

Python 3.3: 0.052 seconds
PyPy 1.9: 0.117 seconds

That's not always feasible though.


//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Larry Hastings

On 02/12/2013 05:25 PM, Christian Tismer wrote:

Ropes have been implemented by Carl-Friedrich Bolz in 2007 as I remember.
No idea what the impact was, if any at all.
Would ropes be an answer (and a simple way to cope with string mutation
patterns) as an alternative implementation, and therefore still justify
the usage of that pattern?


I've always hated the .join(array) idiom for fast string 
concatenation--it's ugly and it flies in the face of TOOWTDI.  I think 
everyone should use x = a + b + c + d for string concatenation, and we 
should just make that fast.


In 2006 I proposed lazy string concatenation, a sort of rope that hid 
the details inside the string object.  If a and b are strings, a+b 
returned a string object that internally lazily contained references to 
a and b, and only computed its value if you asked for it.  Here's the 
Unicode version:


   http://bugs.python.org/issue1629305

Why didn't it get accepted?  I lumped in lazy slicing, a bad move as it 
was more controversial.  That and the possibility that macros like 
PyUnicode_AS_UNICODE could now possibly fail, which would have meant 
checking 400+ call sites to ensure they handle the possibility of 
failure.  This latter work has already happened with the new efficient 
Unicode representation patch.


I keep thinking it's time to revive the lazy string concatenation patch.


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Chris Withers

On 12/02/2013 21:03, Maciej Fijalkowski wrote:

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance fix that uses += for
strings instead of .join() that was there before.


That's... interesting.

I fixed a performance bug in httplib some years ago by doing the exact 
opposite; += - ''.join(). In that case, it changed downloading a file 
from 20 minutes to 3 seconds. That was likely on Python 2.5.



How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?


+1 from me.

Chris

--
Simplistix - Content Management, Batch Processing  Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Antoine Pitrou
Le Wed, 13 Feb 2013 09:02:07 +0100,
Victor Stinner victor.stin...@gmail.com a écrit :
 I added a _PyUnicodeWriter internal API to optimize str%args and
 str.format(args). It uses a buffer which is overallocated, so it's
 basically like CPython str += str optimization. I still don't know how
 efficient it is on Windows, since realloc() is slow on Windows (at
 least on old Windows versions).
 
 We should add an official and public API to concatenate strings.

There's io.StringIO already.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka

On 12.02.13 23:03, Maciej Fijalkowski wrote:

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?


Sometimes the use of += for strings or bytes is appropriate. For 
example, I deliberately used += for bytes instead b''.join() (note that 
there is even no such hack for bytes) in zipfile module where in most 
cases one of component is empty, and the concatenation of nonempty 
components only happens once. b''.join() was noticeably slower here.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano

On 13/02/13 19:52, Larry Hastings wrote:


I've always hated the .join(array) idiom for fast string concatenation
--it's ugly and it flies in the face of TOOWTDI. I think everyone should
use x = a + b + c + d for string concatenation, and we should just make
 that fast.



.join(array) is much nicer looking than:

# ridiculous and impractical for more than a few items
array[0] + array[1] + array[2] + ... + array[N]

or:

# not an expression
result = 
for s in array:
result += s

or even:

# currently prohibited, and not obvious
sum(array, )

although I will admit to a certain fondness towards

# even less obvious than sum
map(operator.add, array)


and join has been the obvious way to do repeated concatenation of many substrings since at 
least Python 1.5 when it was spelled string.join(array [, sep= ]).




--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka

On 13.02.13 09:52, Nick Coghlan wrote:

On Wed, Feb 13, 2013 at 5:42 PM, Alexandre Vassalotti
alexan...@peadrop.com wrote:

I don't think so. Ropes are really useful when you work with gigabytes of
data, but unfortunately they don't make good general-purpose strings.
Monolithic arrays are much more efficient and simple for the typical
use-cases we have in Python.


If I recall correctly, io.StringIO and io.BytesIO have been updated to
use ropes internally in 3.3.


io.BytesIO has not yet. But it will be in 3.4 (issue #15381).

On the other hand, there is a plan for rewriting StringIO to more 
effective continuous buffer implementation (issue #15612).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Xavier Morel
On 2013-02-13, at 12:37 , Steven D'Aprano wrote:
 
# even less obvious than sum
map(operator.add, array)

That one does not work, it'll try to call the binary `add` with each
item of the array when the map iterator is reified, erroring out.

functools.reduce(operator.add, array, '')

would work though, it's an other way to spell `sum` without the
string prohibition.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano

On 13/02/13 20:09, Chris Withers wrote:

On 12/02/2013 21:03, Maciej Fijalkowski wrote:

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance fix that uses += for
strings instead of .join() that was there before.


That's... interesting.

I fixed a performance bug in httplib some years ago by doing the exact opposite; 
+= - ''.join(). In that case, it changed downloading a file from 20 minutes to 
3 seconds. That was likely on Python 2.5.



I remember it well.

http://mail.python.org/pipermail/python-dev/2009-August/091125.html

I frequently link to this thread as an example of just how bad repeated string 
concatenation can be, how painful it can be to debug, and how even when the 
optimization is fast on one system, it may fail and be slow on another system.



--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano

On 13/02/13 22:46, Xavier Morel wrote:

On 2013-02-13, at 12:37 , Steven D'Aprano wrote:


# even less obvious than sum
map(operator.add, array)


That one does not work, it'll try to call the binary `add` with each
item of the array when the map iterator is reified, erroring out.

 functools.reduce(operator.add, array, '')

would work though, it's an other way to spell `sum` without the
string prohibition.


Oops, you are right of course, I was thinking reduce but it came out map.
Thanks for the correction.


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka

On 13.02.13 10:52, Larry Hastings wrote:

I've always hated the .join(array) idiom for fast string
concatenation--it's ugly and it flies in the face of TOOWTDI.  I think
everyone should use x = a + b + c + d for string concatenation, and we
should just make that fast.


I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more 
than 3 and some of them are literal strings.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Daniel Holth
On Wed, Feb 13, 2013 at 7:10 AM, Serhiy Storchaka storch...@gmail.comwrote:

 On 13.02.13 10:52, Larry Hastings wrote:

 I've always hated the .join(array) idiom for fast string
 concatenation--it's ugly and it flies in the face of TOOWTDI.  I think
 everyone should use x = a + b + c + d for string concatenation, and we
 should just make that fast.


 I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more than
 3 and some of them are literal strings.


Fixed: x = ('%s' *  len(abcd)) % abcd
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com wrote:
 I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more than 3
 and some of them are literal strings.

This has the benefit of being slow both on CPython and PyPy. Although
using .format() is even slower. :-)

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Christian Tismer

On 13.02.13 14:17, Daniel Holth wrote:
On Wed, Feb 13, 2013 at 7:10 AM, Serhiy Storchaka storch...@gmail.com 
mailto:storch...@gmail.com wrote:


On 13.02.13 10:52, Larry Hastings wrote:

I've always hated the .join(array) idiom for fast string
concatenation--it's ugly and it flies in the face of TOOWTDI.
 I think
everyone should use x = a + b + c + d for string
concatenation, and we
should just make that fast.


I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is
more than 3 and some of them are literal strings.


Fixed: x = ('%s' *  len(abcd)) % abcd



Which becomes in the new formatting style

x = ('{}' *  len(abcd)).format(*abcd)

hmm, hmm, not soo nice

--
Christian Tismer :^)   mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key - http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Chris Withers

On 13/02/2013 11:53, Steven D'Aprano wrote:

I fixed a performance bug in httplib some years ago by doing the exact
opposite; += - ''.join(). In that case, it changed downloading a file
from 20 minutes to 3 seconds. That was likely on Python 2.5.



I remember it well.

http://mail.python.org/pipermail/python-dev/2009-August/091125.html

I frequently link to this thread as an example of just how bad repeated
string concatenation can be, how painful it can be to debug, and how
even when the optimization is fast on one system, it may fail and be
slow on another system.


Amusing is that 
http://mail.python.org/pipermail/python-dev/2009-August/thread.html#91125 doesn't 
even list the email where I found the problem...


Chris

--
Simplistix - Content Management, Batch Processing  Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Amaury Forgeot d'Arc
2013/2/13 Lennart Regebro rege...@gmail.com

 On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
 wrote:
  I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
 than 3
  and some of them are literal strings.

 This has the benefit of being slow both on CPython and PyPy. Although
 using .format() is even slower. :-)


Did you really try it?
PyPy is really fast with str.__mod__, when the format string is a constant.
Yes, it's jitted.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Christian Tismer

On 13.02.13 15:27, Amaury Forgeot d'Arc wrote:


2013/2/13 Lennart Regebro rege...@gmail.com mailto:rege...@gmail.com

On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka
storch...@gmail.com mailto:storch...@gmail.com wrote:
 I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is
more than 3
 and some of them are literal strings.

This has the benefit of being slow both on CPython and PyPy. Although
using .format() is even slower. :-)


Did you really try it?
PyPy is really fast with str.__mod__, when the format string is a 
constant.

Yes, it's jitted.


How about the .format() style: Is that jitted as well?
In order to get people to prefer .format over __mod__,
it would be nice if PyPy made this actually _faster_ :-)

--
Christian Tismer :^)   mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key - http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka
On 13.02.13 15:23, Lennart Regebro wrote:
 On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com wrote:
 I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more than 3
 and some of them are literal strings.
 
 This has the benefit of being slow both on CPython and PyPy. Although
 using .format() is even slower. :-)

Only slightly.

$ ./python -m timeit -s spam = 'spam'; ham = 'ham'  spam + ' = ' + ham + 
'\n'
100 loops, best of 3: 0.501 usec per loop
$ ./python -m timeit -s spam = 'spam'; ham = 'ham'  ''.join([spam, ' = ', 
ham, '\n'])
100 loops, best of 3: 0.504 usec per loop
$ ./python -m timeit -s spam = 'spam'; ham = 'ham'  '%s = %s\n' % (spam, 
ham)
100 loops, best of 3: 0.524 usec per loop

But the last variant looks better for me.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Amaury Forgeot d'Arc
2013/2/13 Christian Tismer tis...@stackless.com

 On 13.02.13 15:27, Amaury Forgeot d'Arc wrote:


 2013/2/13 Lennart Regebro rege...@gmail.com

 On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
 wrote:
  I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
 than 3
  and some of them are literal strings.

  This has the benefit of being slow both on CPython and PyPy. Although
 using .format() is even slower. :-)


 Did you really try it?
 PyPy is really fast with str.__mod__, when the format string is a constant.
 Yes, it's jitted.


 How about the .format() style: Is that jitted as well?
 In order to get people to prefer .format over __mod__,
 it would be nice if PyPy made this actually _faster_ :-)


.format() is jitted as well.
But it's still slower than str.__mod__ (about 25%)
I suppose it can be further optimized.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc
amaur...@gmail.com wrote:

 2013/2/13 Lennart Regebro rege...@gmail.com

 On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
 wrote:
  I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
  than 3
  and some of them are literal strings.

 This has the benefit of being slow both on CPython and PyPy. Although
 using .format() is even slower. :-)


 Did you really try it?

Yes.

 PyPy is really fast with str.__mod__, when the format string is a constant.
 Yes, it's jitted.

Simple concatenation: s1 = s1 + s2
PyPy-1.9 time for 100 concats of 1 length strings = 7.133
CPython time for 100 concats of 1 length strings = 0.005

Making a list of strings and joining after the loop: s1 = ''.join(l)
PyPy-1.9 time for 100 concats of 1 length strings = 0.005
CPython time for 100 concats of 1 length strings = 0.003

Old formatting: s1 = '%s%s' % (s1, s2)
PyPy-1.9 time for 100 concats of 1 length strings = 20.924
CPython time for 100 concats of 1 length strings = 3.787

New formatting: s1 = '{0}{1}'.format(s1, s2)
PyPy-1.9 time for 100 concats of 1 length strings = 13.249
CPython time for 100 concats of 1 length strings = 3.751


I have, by the way, yet to find a usecase where the fastest method in
CPython is not also the fastest in PyPy.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc
amaur...@gmail.com wrote:
 Yes, it's jitted.

Admittedly, I have no idea in which cases the JIT kicks in, and what I
should do to make that happen to make sure I have the best possible
real-life test cases.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka

On 13.02.13 15:17, Daniel Holth wrote:

On Wed, Feb 13, 2013 at 7:10 AM, Serhiy Storchaka storch...@gmail.com
mailto:storch...@gmail.com wrote:
I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is
more than 3 and some of them are literal strings.

Fixed: x = ('%s' *  len(abcd)) % abcd


No, you don't need this for the constant number of strings. Because 
almost certainly some of strings will be literals, you can write this in 
a more nice way. Compare:


'config[' + key + '] = ' + value + '\n'
''.join(['config[', key, '] = ', value, '\n'])
'config[%s] = %s\n' % (key, value)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Amaury Forgeot d'Arc
2013/2/13 Lennart Regebro rege...@gmail.com

 On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc
 amaur...@gmail.com wrote:
  Yes, it's jitted.

 Admittedly, I have no idea in which cases the JIT kicks in, and what I
 should do to make that happen to make sure I have the best possible
 real-life test cases.


PyPy JIT kicks in only after 1000 iterations.
I usually use timeit.
It's funny to see how the 1000 loops line is 5 times faster than the 100
loops:

$ ./pypy-c -m timeit -v -s a,b,c,d='1234' '{}{}{}{}'.format(a,b,c,d)
10 loops - 2.19e-05 secs
100 loops - 0.000122 secs
1000 loops - 0.00601 secs
1 loops - 0.000363 secs
10 loops - 0.00528 secs
100 loops - 0.0533 secs
1000 loops - 0.528 secs
raw times: 0.521 0.52 0.51
1000 loops, best of 3: 0.051 usec per loop


-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread MRAB

On 2013-02-13 13:23, Lennart Regebro wrote:

On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com wrote:

I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more than 3
and some of them are literal strings.


This has the benefit of being slow both on CPython and PyPy. Although
using .format() is even slower. :-)


How about adding a class method for catenation:

str.cat(a, b, c, d)
str.cat([a, b, c, d]) # Equivalent to .join([a, b, c, d])

Each argument could be a string or a list of strings.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 7:33 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 On 2013-02-13 13:23, Lennart Regebro wrote:

 On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
 wrote:

 I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
 than 3
 and some of them are literal strings.


 This has the benefit of being slow both on CPython and PyPy. Although
 using .format() is even slower. :-)

 How about adding a class method for catenation:

 str.cat(a, b, c, d)
 str.cat([a, b, c, d]) # Equivalent to .join([a, b, c, d])

 Each argument could be a string or a list of strings.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

I actually wonder.

There seems to be the consensus to avoid += (to some extent). Can
someone commit the change to urrllib then? I'm talking about reverting
http://bugs.python.org/issue1285086 specifically
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Brett Cannon
On Wed, Feb 13, 2013 at 1:06 PM, Maciej Fijalkowski fij...@gmail.comwrote:

 On Wed, Feb 13, 2013 at 7:33 PM, MRAB pyt...@mrabarnett.plus.com wrote:
  On 2013-02-13 13:23, Lennart Regebro wrote:
 
  On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
  wrote:
 
  I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
  than 3
  and some of them are literal strings.
 
 
  This has the benefit of being slow both on CPython and PyPy. Although
  using .format() is even slower. :-)
 
  How about adding a class method for catenation:
 
  str.cat(a, b, c, d)
  str.cat([a, b, c, d]) # Equivalent to .join([a, b, c, d])
 
  Each argument could be a string or a list of strings.
 
 
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe:
  http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

 I actually wonder.

 There seems to be the consensus to avoid += (to some extent). Can
 someone commit the change to urrllib then? I'm talking about reverting
 http://bugs.python.org/issue1285086 specifically


Please re-open the bug with a comment as to why and I'm sure someone will
get to it.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 8:24 PM, Brett Cannon br...@python.org wrote:



 On Wed, Feb 13, 2013 at 1:06 PM, Maciej Fijalkowski fij...@gmail.com
 wrote:

 On Wed, Feb 13, 2013 at 7:33 PM, MRAB pyt...@mrabarnett.plus.com wrote:
  On 2013-02-13 13:23, Lennart Regebro wrote:
 
  On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
  wrote:
 
  I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
  than 3
  and some of them are literal strings.
 
 
  This has the benefit of being slow both on CPython and PyPy. Although
  using .format() is even slower. :-)
 
  How about adding a class method for catenation:
 
  str.cat(a, b, c, d)
  str.cat([a, b, c, d]) # Equivalent to .join([a, b, c, d])
 
  Each argument could be a string or a list of strings.
 
 
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe:
  http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

 I actually wonder.

 There seems to be the consensus to avoid += (to some extent). Can
 someone commit the change to urrllib then? I'm talking about reverting
 http://bugs.python.org/issue1285086 specifically


 Please re-open the bug with a comment as to why and I'm sure someone will
 get to it.

I can't re-open the bug, my account is kind of lame (and seriously,
why do you guys *do* have multiple layers of bug tracker accounts?)

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Brett Cannon
On Wed, Feb 13, 2013 at 1:27 PM, Maciej Fijalkowski fij...@gmail.comwrote:

 On Wed, Feb 13, 2013 at 8:24 PM, Brett Cannon br...@python.org wrote:
 
 
 
  On Wed, Feb 13, 2013 at 1:06 PM, Maciej Fijalkowski fij...@gmail.com
  wrote:
 
  On Wed, Feb 13, 2013 at 7:33 PM, MRAB pyt...@mrabarnett.plus.com
 wrote:
   On 2013-02-13 13:23, Lennart Regebro wrote:
  
   On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka 
 storch...@gmail.com
   wrote:
  
   I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is
 more
   than 3
   and some of them are literal strings.
  
  
   This has the benefit of being slow both on CPython and PyPy. Although
   using .format() is even slower. :-)
  
   How about adding a class method for catenation:
  
   str.cat(a, b, c, d)
   str.cat([a, b, c, d]) # Equivalent to .join([a, b, c, d])
  
   Each argument could be a string or a list of strings.
  
  
   ___
   Python-Dev mailing list
   Python-Dev@python.org
   http://mail.python.org/mailman/listinfo/python-dev
   Unsubscribe:
   http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
 
  I actually wonder.
 
  There seems to be the consensus to avoid += (to some extent). Can
  someone commit the change to urrllib then? I'm talking about reverting
  http://bugs.python.org/issue1285086 specifically
 
 
  Please re-open the bug with a comment as to why and I'm sure someone will
  get to it.

 I can't re-open the bug, my account is kind of lame


Then leave a comment and I will re-open it.


 (and seriously,
 why do you guys *do* have multiple layers of bug tracker accounts?)


You obviously have not had users argue with your decision by constantly
flipping a bug back open. =)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Christian Tismer

On 13.02.13 19:06, Maciej Fijalkowski wrote:

On Wed, Feb 13, 2013 at 7:33 PM, MRAB pyt...@mrabarnett.plus.com wrote:

On 2013-02-13 13:23, Lennart Regebro wrote:

On Wed, Feb 13, 2013 at 1:10 PM, Serhiy Storchaka storch...@gmail.com
wrote:

I prefer x = '%s%s%s%s' % (a, b, c, d) when string's number is more
than 3
and some of them are literal strings.


This has the benefit of being slow both on CPython and PyPy. Although
using .format() is even slower. :-)


How about adding a class method for catenation:

 str.cat(a, b, c, d)
 str.cat([a, b, c, d]) # Equivalent to .join([a, b, c, d])

Each argument could be a string or a list of strings.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

I actually wonder.

There seems to be the consensus to avoid += (to some extent). Can
someone commit the change to urrllib then? I'm talking about reverting
http://bugs.python.org/issue1285086 specifically


So _is_ += faster in certain library funcs than ''.join() ?
If that's the case, the behavior of string concat could be something 
that might be added

to some implementation info, if speed really matters.

The library function then could take this info and use the appropriate code
path to always be fast, during module initialisation.
This is also quite explicit, since it tells the reader not to use in-place
add when it is not optimized.

If += is anyway a bit slower than other ways, forget it.
I would then maybe add a commend somewhere that says
avoiding '+=' because it is not reliable or something.

cheers - chris

--
Christian Tismer :^)   mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key - http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Serhiy Storchaka

On 13.02.13 20:40, Christian Tismer wrote:

If += is anyway a bit slower than other ways, forget it.
I would then maybe add a commend somewhere that says
avoiding '+=' because it is not reliable or something.


+= is a fastest way (in any implementation) if you concatenates only two 
strings.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 7:06 PM, Maciej Fijalkowski fij...@gmail.com wrote:
 I actually wonder.

 There seems to be the consensus to avoid += (to some extent). Can
 someone commit the change to urrllib then? I'm talking about reverting
 http://bugs.python.org/issue1285086 specifically

That's unquoting of URLs, strings that aren't particularly long,
normally. And it's not in any tight loops. I'm astonished that any
change makes any noticeable speed difference here at all.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Lennart Regebro
On Wed, Feb 13, 2013 at 4:02 PM, Amaury Forgeot d'Arc
amaur...@gmail.com wrote:
 2013/2/13 Lennart Regebro rege...@gmail.com

 On Wed, Feb 13, 2013 at 3:27 PM, Amaury Forgeot d'Arc
 amaur...@gmail.com wrote:
  Yes, it's jitted.

 Admittedly, I have no idea in which cases the JIT kicks in, and what I
 should do to make that happen to make sure I have the best possible
 real-life test cases.


 PyPy JIT kicks in only after 1000 iterations.

Actually, my test code mixed iterations and string length up when
printing the results, so the tests I showed was not 100 iterations
with 10.000 long string, but 10.000 iterations with 100 long strings.

No matter what the iteration/string length is .format() is the slowest
or second slowest of all string concatenation methods I've tried and
'%s%s' % just marginally faster. This both on PyPy and CPython and
irrespective of string length.

I'll stick my neck out and say that using formatting for concatenation
is probably an anti-pattern.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Victor Stinner
Hi,

I wrote quick hack to expose _PyUnicodeWriter as _string.UnicodeWriter:
http://www.haypocalc.com/tmp/string_unicode_writer.patch

And I wrote a (micro-)benchmark:
http://www.haypocalc.com/tmp/bench_join.py
( The benchmark uses only ASCII string, it would be interesting to
test latin1, BMP and non-BMP characters too. )

UnicodeWriter (using the writer += str API) is the fastest method in
most cases, except for data = ['a'*10**4] * 10**2 (in this case, it's
8x slower!). I guess that the overhead comes for the overallocation
which then require to shrink the buffer (shrinking may copy the whole
string). The overallocation factor may be adapted depending on the
size.

If computing the final length is cheap (eg. if it's always the same),
it's always faster to use UnicodeWriter with a preallocated buffer.
The UnicodeWriter +=; preallocate test uses a precomputed length
(ok, it's cheating!).

I also implemented UnicodeWriter.append method to measure the overhead
of a method lookup: it's expensive :-)

--

Platform: Linux-3.6.10-2.fc16.x86_64-x86_64-with-fedora-16-Verne
Python unicode implementation: PEP 393
Date: 2013-02-14 01:00:06
CFLAGS: -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
SCM: hg revision=659ef9d360ae+ tag=tip branch=default date=2013-02-13
15:25 +
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Python version: 3.4.0a0 (default:659ef9d360ae+, Feb 14 2013, 00:35:19)
[GCC 4.6.3 20120306 (Red Hat 4.6.3-2)]
Bits: int=32, long=64, long long=64, pointer=64

[ data = ['a'] * 10**2 ]

4.21 us: UnicodeWriter +=; preallocate
4.86 us (+15%): UnicodeWriter append; lookup attr once
4.99 us (+18%): UnicodeWriter +=

6.35 us (+51%): str += str
6.45 us (+53%): io.StringIO; lookup attr once
7.02 us (+67%): .join(list)
7.46 us (+77%): UnicodeWriter append
8.77 us (+108%): io.StringIO

[ data = ['abc'] * 10**4 ]

356 us: UnicodeWriter append; lookup attr once
375 us (+5%): UnicodeWriter +=; preallocate
376 us (+6%): UnicodeWriter +=

495 us (+39%): io.StringIO; lookup attr once
614 us (+73%): .join(list)
629 us (+77%): UnicodeWriter append
716 us (+101%): str += str
737 us (+107%): io.StringIO

[ data = ['a'*10**4] * 10**1 ]

3.67 us: str += str
3.76 us: UnicodeWriter +=; preallocate

3.95 us (+8%): UnicodeWriter +=
4.01 us (+9%): UnicodeWriter append; lookup attr once
4.06 us (+11%): .join(list)
4.24 us (+15%): UnicodeWriter append
4.59 us (+25%): io.StringIO; lookup attr once
4.77 us (+30%): io.StringIO

[ data = ['a'*10**4] * 10**2 ]

41.2 us: UnicodeWriter +=; preallocate
43.8 us (+6%): str += str
45.4 us (+10%): .join(list)
45.9 us (+11%): io.StringIO; lookup attr once
48.3 us (+17%): io.StringIO

370 us (+797%): UnicodeWriter +=
370 us (+798%): UnicodeWriter append; lookup attr once
377 us (+816%): UnicodeWriter append

[ data = ['a'*10**4] * 10**4 ]

38.9 ms: UnicodeWriter +=; preallocate
39 ms: .join(list)
39.1 ms: io.StringIO; lookup attr once
39.4 ms: UnicodeWriter append; lookup attr once
39.5 ms: io.StringIO
39.6 ms: UnicodeWriter +=
40.1 ms: str += str
40.1 ms: UnicodeWriter append

Victor

2013/2/13 Antoine Pitrou solip...@pitrou.net:
 Le Wed, 13 Feb 2013 09:02:07 +0100,
 Victor Stinner victor.stin...@gmail.com a écrit :
 I added a _PyUnicodeWriter internal API to optimize str%args and
 str.format(args). It uses a buffer which is overallocated, so it's
 basically like CPython str += str optimization. I still don't know how
 efficient it is on Windows, since realloc() is slow on Windows (at
 least on old Windows versions).

 We should add an official and public API to concatenate strings.

 There's io.StringIO already.

 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Steven D'Aprano

On 14/02/13 01:18, Chris Withers wrote:

On 13/02/2013 11:53, Steven D'Aprano wrote:

I fixed a performance bug in httplib some years ago by doing the exact
opposite; += - ''.join(). In that case, it changed downloading a file
from 20 minutes to 3 seconds. That was likely on Python 2.5.



I remember it well.

http://mail.python.org/pipermail/python-dev/2009-August/091125.html

I frequently link to this thread as an example of just how bad repeated
string concatenation can be, how painful it can be to debug, and how
even when the optimization is fast on one system, it may fail and be
slow on another system.


Amusing is that 
http://mail.python.org/pipermail/python-dev/2009-August/thread.html#91125 
doesn't even list the email where I found the problem...


That's because it wasn't solved until the following month.

http://mail.python.org/pipermail/python-dev/2009-September/thread.html#91581



--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-13 Thread Antoine Pitrou
On Thu, 14 Feb 2013 01:21:40 +0100
Victor Stinner victor.stin...@gmail.com wrote:
 
 UnicodeWriter (using the writer += str API) is the fastest method in
 most cases, except for data = ['a'*10**4] * 10**2 (in this case, it's
 8x slower!). I guess that the overhead comes for the overallocation
 which then require to shrink the buffer (shrinking may copy the whole
 string). The overallocation factor may be adapted depending on the
 size.

How about testing on Windows?

 If computing the final length is cheap (eg. if it's always the same),
 it's always faster to use UnicodeWriter with a preallocated buffer.

That's not a particularly surprising discovery, is it? ;-)

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
Hi

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance fix that uses += for
strings instead of .join() that was there before.

Now this hurts pypy (we can mitigate it to some degree though) and
possible Jython and IronPython too.

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?

What about other performance improvements in stdlib that are
problematic for pypy or others?

Personally I would like cleaner code in stdlib vs speeding up CPython.
Typically that also helps pypy so I'm not unbiased.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou

Hi !

On Tue, 12 Feb 2013 23:03:04 +0200
Maciej Fijalkowski fij...@gmail.com wrote:
 
 We recently encountered a performance issue in stdlib for pypy. It
 turned out that someone commited a performance fix that uses += for
 strings instead of .join() that was there before.
 
 Now this hurts pypy (we can mitigate it to some degree though) and
 possible Jython and IronPython too.
 
 How people feel about generally not having += on long strings in
 stdlib (since the refcount = 1 thing is a hack)?

I agree that += should not be used as an optimization (on strings) in
the stdlib code. The optimization is there so that uncareful code does
not degenerate, but deliberately relying on it is a bit devilish.
(optimisare diabolicum :-))

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Brett Cannon
On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou solip...@pitrou.net wrote:


 Hi !

 On Tue, 12 Feb 2013 23:03:04 +0200
 Maciej Fijalkowski fij...@gmail.com wrote:
 
  We recently encountered a performance issue in stdlib for pypy. It
  turned out that someone commited a performance fix that uses += for
  strings instead of .join() that was there before.
 
  Now this hurts pypy (we can mitigate it to some degree though) and
  possible Jython and IronPython too.
 
  How people feel about generally not having += on long strings in
  stdlib (since the refcount = 1 thing is a hack)?

 I agree that += should not be used as an optimization (on strings) in
 the stdlib code. The optimization is there so that uncareful code does
 not degenerate, but deliberately relying on it is a bit devilish.
 (optimisare diabolicum :-))


Ditto from me. If you're going so far as to want to optimize Python code
then you probably are going to care enough to accelerate it in C, in which
case you can leave the Python code idiomatic.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
On Tue, Feb 12, 2013 at 11:16 PM, Brett Cannon br...@python.org wrote:



 On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou solip...@pitrou.net wrote:


 Hi !

 On Tue, 12 Feb 2013 23:03:04 +0200
 Maciej Fijalkowski fij...@gmail.com wrote:
 
  We recently encountered a performance issue in stdlib for pypy. It
  turned out that someone commited a performance fix that uses += for
  strings instead of .join() that was there before.
 
  Now this hurts pypy (we can mitigate it to some degree though) and
  possible Jython and IronPython too.
 
  How people feel about generally not having += on long strings in
  stdlib (since the refcount = 1 thing is a hack)?

 I agree that += should not be used as an optimization (on strings) in
 the stdlib code. The optimization is there so that uncareful code does
 not degenerate, but deliberately relying on it is a bit devilish.
 (optimisare diabolicum :-))


 Ditto from me. If you're going so far as to want to optimize Python code
 then you probably are going to care enough to accelerate it in C, in which
 case you can leave the Python code idiomatic.

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com


I should actually reference the original CPython issue
http://bugs.python.org/issue1285086
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread fwierzbi...@gmail.com
On Tue, Feb 12, 2013 at 1:03 PM, Maciej Fijalkowski fij...@gmail.com wrote:
 Hi

 We recently encountered a performance issue in stdlib for pypy. It
 turned out that someone commited a performance fix that uses += for
 strings instead of .join() that was there before.

 Now this hurts pypy (we can mitigate it to some degree though) and
 possible Jython and IronPython too.
Just to confirm Jython does not have optimizations for += String and
will do much better with the idiomatic .join().

-Frank
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Tue, 12 Feb 2013 13:32:50 -0800
fwierzbi...@gmail.com fwierzbi...@gmail.com wrote:
 On Tue, Feb 12, 2013 at 1:03 PM, Maciej Fijalkowski fij...@gmail.com wrote:
  Hi
 
  We recently encountered a performance issue in stdlib for pypy. It
  turned out that someone commited a performance fix that uses += for
  strings instead of .join() that was there before.
 
  Now this hurts pypy (we can mitigate it to some degree though) and
  possible Jython and IronPython too.
 Just to confirm Jython does not have optimizations for += String and
 will do much better with the idiomatic .join().

For the record, io.StringIO should be quite fast in 3.3.
(except for the method call overhead that Guido is complaining
about :-))

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Ned Batchelder

On 2/12/2013 4:16 PM, Brett Cannon wrote:




On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou solip...@pitrou.net 
mailto:solip...@pitrou.net wrote:



Hi !

On Tue, 12 Feb 2013 23:03:04 +0200
Maciej Fijalkowski fij...@gmail.com mailto:fij...@gmail.com wrote:

 We recently encountered a performance issue in stdlib for pypy. It
 turned out that someone commited a performance fix that uses
+= for
 strings instead of .join() that was there before.

 Now this hurts pypy (we can mitigate it to some degree though) and
 possible Jython and IronPython too.

 How people feel about generally not having += on long strings in
 stdlib (since the refcount = 1 thing is a hack)?

I agree that += should not be used as an optimization (on strings) in
the stdlib code. The optimization is there so that uncareful code does
not degenerate, but deliberately relying on it is a bit devilish.
(optimisare diabolicum :-))


Ditto from me. If you're going so far as to want to optimize Python 
code then you probably are going to care enough to accelerate it in C, 
in which case you can leave the Python code idiomatic.


But the only reason .join() is a Python idiom in the first place is 
because it was the fast way to do what everyone initially coded as s 
+=    Just because we all learned a long time ago that joining was 
the fast way to build a string doesn't mean that .join() is the clean 
idiomatic way to do it.


--Ned.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread R. David Murray
On Tue, 12 Feb 2013 16:40:38 -0500, Ned Batchelder n...@nedbatchelder.com 
wrote:
 On 2/12/2013 4:16 PM, Brett Cannon wrote:
  On Tue, Feb 12, 2013 at 4:06 PM, Antoine Pitrou solip...@pitrou.net 
  mailto:solip...@pitrou.net wrote:
  On Tue, 12 Feb 2013 23:03:04 +0200
  Maciej Fijalkowski fij...@gmail.com mailto:fij...@gmail.com wrote:
  
   We recently encountered a performance issue in stdlib for pypy. It
   turned out that someone commited a performance fix that uses
  += for
   strings instead of .join() that was there before.
  
   Now this hurts pypy (we can mitigate it to some degree though) and
   possible Jython and IronPython too.
  
   How people feel about generally not having += on long strings in
   stdlib (since the refcount = 1 thing is a hack)?
 
  I agree that += should not be used as an optimization (on strings) in
  the stdlib code. The optimization is there so that uncareful code does
  not degenerate, but deliberately relying on it is a bit devilish.
  (optimisare diabolicum :-))
 
  Ditto from me. If you're going so far as to want to optimize Python 
  code then you probably are going to care enough to accelerate it in C, 
  in which case you can leave the Python code idiomatic.
 
 But the only reason .join() is a Python idiom in the first place is 
 because it was the fast way to do what everyone initially coded as s 
 +=    Just because we all learned a long time ago that joining was 
 the fast way to build a string doesn't mean that .join() is the clean 
 idiomatic way to do it.

If 'idiomatic' (a terrible term) means the standard way in this
language, which is how it is employed in the programming community,
then yes, .join() is the idiomatic way to write that *in Python*,
and thus is cleaner code *in Python*.

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Tue, 12 Feb 2013 16:40:38 -0500
Ned Batchelder n...@nedbatchelder.com wrote:
 
 But the only reason .join() is a Python idiom in the first place is 
 because it was the fast way to do what everyone initially coded as s 
 +=    Just because we all learned a long time ago that joining was 
 the fast way to build a string doesn't mean that .join() is the clean 
 idiomatic way to do it.

It's idiomatic because strings are immutable (by design, not because of
an optimization detail) and therefore concatenation *has* to imply
building a new string from scratch.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Xavier Morel
On 2013-02-12, at 22:40 , Ned Batchelder wrote:
 But the only reason .join() is a Python idiom in the first place is because 
 it was the fast way to do what everyone initially coded as s +=    
 Just because we all learned a long time ago that joining was the fast way to 
 build a string doesn't mean that .join() is the clean idiomatic way to do 
 it.

Well no, str.join is the idiomatic way to do it because it is:

 idiomatic |ˌidēəˈmatik|
 adjective
 1 using, containing, or denoting expressions that are natural to a native 
 speaker 

or would you argue that the natural way for weathered python developers
to concatenate string is to *not* use str.join?

Of course usually idioms have original reasons for being, reasons which
are sometimes long gone (not unlike religious mandates or prohibitions).

For Python, ignoring the refcounting hack (which is not only cpython
specific but *current* cpython specific *and* doesn't apply to all
cases) that reason still exist: python's strings are formally immutable
bytestrings, and repeated concatenation of immutable bytestrings is
quadratic.

Thus str.join is idiomatic, and although it's possible (if difficult) to
change the idiom straight string concatenation would make a terrible new
idiom as it will behave either unreliably (current CPython) or simply
terribly (every other Python implementation).

No?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread MRAB

On 2013-02-12 21:44, Antoine Pitrou wrote:

On Tue, 12 Feb 2013 16:40:38 -0500
Ned Batchelder n...@nedbatchelder.com wrote:


But the only reason .join() is a Python idiom in the first place is
because it was the fast way to do what everyone initially coded as s
+=    Just because we all learned a long time ago that joining was
the fast way to build a string doesn't mean that .join() is the clean
idiomatic way to do it.


It's idiomatic because strings are immutable (by design, not because of
an optimization detail) and therefore concatenation *has* to imply
building a new string from scratch.


Tuples are much like immutable lists; sets were added, and then frozensets;
should we be adding mutable strings too (a bit like C#'s StringBuilder)?
(Just wondering...)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Christian Heimes
Am 12.02.2013 22:32, schrieb Antoine Pitrou:
 For the record, io.StringIO should be quite fast in 3.3.
 (except for the method call overhead that Guido is complaining
 about :-))

AFAIK it's not the actual *call* of the method that is slow, but rather
attribute lookup and creation of bound method objects. If speed is of
the essence, code can cache the method object locally:

strio = io.StringIO()
write = strio.write
for element in elements:
write(element)
result = strio.getvalue()


Christian
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 1:28 AM, Christian Heimes christ...@python.org wrote:
 Am 12.02.2013 22:32, schrieb Antoine Pitrou:
 For the record, io.StringIO should be quite fast in 3.3.
 (except for the method call overhead that Guido is complaining
 about :-))

 AFAIK it's not the actual *call* of the method that is slow, but rather
 attribute lookup and creation of bound method objects. If speed is of
 the essence, code can cache the method object locally:

 strio = io.StringIO()
 write = strio.write
 for element in elements:
 write(element)
 result = strio.getvalue()

And this is a great example of muddying code in stdlib for the sake of
speeding up CPython.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Maciej Fijalkowski
On Wed, Feb 13, 2013 at 1:20 AM, MRAB pyt...@mrabarnett.plus.com wrote:
 On 2013-02-12 21:44, Antoine Pitrou wrote:

 On Tue, 12 Feb 2013 16:40:38 -0500
 Ned Batchelder n...@nedbatchelder.com wrote:


 But the only reason .join() is a Python idiom in the first place is
 because it was the fast way to do what everyone initially coded as s
 +=    Just because we all learned a long time ago that joining was
 the fast way to build a string doesn't mean that .join() is the clean
 idiomatic way to do it.


 It's idiomatic because strings are immutable (by design, not because of
 an optimization detail) and therefore concatenation *has* to imply
 building a new string from scratch.

 Tuples are much like immutable lists; sets were added, and then frozensets;
 should we be adding mutable strings too (a bit like C#'s StringBuilder)?
 (Just wondering...)

Isn't bytearray what you're looking for?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Nick Coghlan
On 13 Feb 2013 07:08, Maciej Fijalkowski fij...@gmail.com wrote:

 Hi

 We recently encountered a performance issue in stdlib for pypy. It
 turned out that someone commited a performance fix that uses += for
 strings instead of .join() that was there before.

 Now this hurts pypy (we can mitigate it to some degree though) and
 possible Jython and IronPython too.

 How people feel about generally not having += on long strings in
 stdlib (since the refcount = 1 thing is a hack)?

 What about other performance improvements in stdlib that are
 problematic for pypy or others?

 Personally I would like cleaner code in stdlib vs speeding up CPython.

For the specific case of Don't rely on the fragile refcounting hack in
CPython's string concatenation I strongly agree. However, as a general
principle, I can't agree until speed.python.org is a going concern and we
can get a reasonable overview of any resulting performance implications.

Regards,
Nick.

 Typically that also helps pypy so I'm not unbiased.

 Cheers,
 fijal
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Alexandre Vassalotti
On Tue, Feb 12, 2013 at 1:44 PM, Antoine Pitrou solip...@pitrou.net wrote:

 It's idiomatic because strings are immutable (by design, not because of
 an optimization detail) and therefore concatenation *has* to imply
 building a new string from scratch.


Not necessarily. It is totally possible to implement strings such they are
immutable and  concatenation takes O(1): ropes are the canonical example of
this.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Christian Tismer

On 12.02.13 22:03, Maciej Fijalkowski wrote:

Hi

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance fix that uses += for
strings instead of .join() that was there before.

Now this hurts pypy (we can mitigate it to some degree though) and
possible Jython and IronPython too.

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?

What about other performance improvements in stdlib that are
problematic for pypy or others?

Personally I would like cleaner code in stdlib vs speeding up CPython.
Typically that also helps pypy so I'm not unbiased.

Cheers,
fijal
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/tismer%40stackless.com


Howdy.

Funny coincidence that this issue came up an hour after I asked about
string_concat optimization absence on the pypy channel.

I did not read email while writing the efficient string concatenation
re-iteration._
_
Maybe we should use the time machine, go backwards and undo the
patch, although it still makes a lot of sense and is fastest, opcode-wise,
at least on CPython.

Which will not matter so much for PyPy of course because _that_ goes away.

Alas, the damage to the mindsets already has happened, and the cure
will probably be as hard as the eviction of the print statement, after all.

But since I'm a complete Python 3.3 convert (with consequent changes
to my projects which was not so trivial),
I think to also start publicly saying that s += t is a pattern that should
not be used in the Gigabyte domain, from 2013.

Actually a tad, because it contradicted normal programming patterns
in an appealing way. Way too sexy...

But let's toss it. Keep the past eight years in good memories as an 
exceptional

period of liberal abuse. Maybe we should add it as an addition to the
Zen of Python:
There are obviously good things, but obvious is the finest liar.

--
Christian Tismer :^)   mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key - http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Terry Reedy

On 2/12/2013 6:20 PM, MRAB wrote:


Tuples are much like immutable lists; sets were added, and then frozensets;
should we be adding mutable strings too (a bit like C#'s StringBuilder)?
(Just wondering...)


StringIO is effectively a mutable string with a file interface.
sio.write('abc') is the equivalent of lis.extend(['a', 'b', 'c']).


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Terry Reedy

On 2/12/2013 4:03 PM, Maciej Fijalkowski wrote:

Hi

We recently encountered a performance issue in stdlib for pypy. It
turned out that someone commited a performance fix that uses += for
strings instead of .join() that was there before.

Now this hurts pypy (we can mitigate it to some degree though) and
possible Jython and IronPython too.

How people feel about generally not having += on long strings in
stdlib (since the refcount = 1 thing is a hack)?

What about other performance improvements in stdlib that are
problematic for pypy or others?

Personally I would like cleaner code in stdlib vs speeding up CPython.
Typically that also helps pypy so I'm not unbiased.


I agree. sum() refuses to sum strings specifically to encourage .join().

 sum(('x', 'b'), '')
Traceback (most recent call last):
  File pyshell#0, line 1, in module
sum(('x', 'b'), '')
TypeError: sum() can't sum strings [use ''.join(seq) instead]

The doc entry for sum says the same thing.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Wed, 13 Feb 2013 00:28:15 +0100
Christian Heimes christ...@python.org wrote:
 Am 12.02.2013 22:32, schrieb Antoine Pitrou:
  For the record, io.StringIO should be quite fast in 3.3.
  (except for the method call overhead that Guido is complaining
  about :-))
 
 AFAIK it's not the actual *call* of the method that is slow, but rather
 attribute lookup and creation of bound method objects.

Take a look at http://bugs.python.org/issue17170

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Wed, 13 Feb 2013 09:39:23 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 On 13 Feb 2013 07:08, Maciej Fijalkowski fij...@gmail.com wrote:
 
  Hi
 
  We recently encountered a performance issue in stdlib for pypy. It
  turned out that someone commited a performance fix that uses += for
  strings instead of .join() that was there before.
 
  Now this hurts pypy (we can mitigate it to some degree though) and
  possible Jython and IronPython too.
 
  How people feel about generally not having += on long strings in
  stdlib (since the refcount = 1 thing is a hack)?
 
  What about other performance improvements in stdlib that are
  problematic for pypy or others?
 
  Personally I would like cleaner code in stdlib vs speeding up CPython.
 
 For the specific case of Don't rely on the fragile refcounting hack in
 CPython's string concatenation I strongly agree. However, as a general
 principle, I can't agree until speed.python.org is a going concern and we
 can get a reasonable overview of any resulting performance implications.

Anybody can run the benchmark suite for himself, speed.p.o is
(fortunately) not a roadblock:
http://bugs.python.org/issue17170

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Antoine Pitrou
On Wed, 13 Feb 2013 08:16:21 +0100
Antoine Pitrou solip...@pitrou.net wrote:
 On Wed, 13 Feb 2013 09:39:23 +1000
 Nick Coghlan ncogh...@gmail.com wrote:
  On 13 Feb 2013 07:08, Maciej Fijalkowski fij...@gmail.com wrote:
  
   Hi
  
   We recently encountered a performance issue in stdlib for pypy. It
   turned out that someone commited a performance fix that uses += for
   strings instead of .join() that was there before.
  
   Now this hurts pypy (we can mitigate it to some degree though) and
   possible Jython and IronPython too.
  
   How people feel about generally not having += on long strings in
   stdlib (since the refcount = 1 thing is a hack)?
  
   What about other performance improvements in stdlib that are
   problematic for pypy or others?
  
   Personally I would like cleaner code in stdlib vs speeding up CPython.
  
  For the specific case of Don't rely on the fragile refcounting hack in
  CPython's string concatenation I strongly agree. However, as a general
  principle, I can't agree until speed.python.org is a going concern and we
  can get a reasonable overview of any resulting performance implications.
 
 Anybody can run the benchmark suite for himself, speed.p.o is
 (fortunately) not a roadblock:
 http://bugs.python.org/issue17170

And I meant to paste the repo URL actually:
http://hg.python.org/benchmarks/

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Alexandre Vassalotti
On Tue, Feb 12, 2013 at 5:25 PM, Christian Tismer tis...@stackless.comwrote:

 Would ropes be an answer (and a simple way to cope with string mutation
 patterns) as an alternative implementation, and therefore still justify
 the usage of that pattern?


I don't think so. Ropes are really useful when you work with gigabytes of
data, but unfortunately they don't make good general-purpose strings.
Monolithic arrays are much more efficient and simple for the typical
use-cases we have in Python.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usage of += on strings in loops in stdlib

2013-02-12 Thread Nick Coghlan
On Wed, Feb 13, 2013 at 5:42 PM, Alexandre Vassalotti
alexan...@peadrop.com wrote:
 On Tue, Feb 12, 2013 at 5:25 PM, Christian Tismer tis...@stackless.com
 wrote:

 Would ropes be an answer (and a simple way to cope with string mutation
 patterns) as an alternative implementation, and therefore still justify
 the usage of that pattern?


 I don't think so. Ropes are really useful when you work with gigabytes of
 data, but unfortunately they don't make good general-purpose strings.
 Monolithic arrays are much more efficient and simple for the typical
 use-cases we have in Python.

If I recall correctly, io.StringIO and io.BytesIO have been updated to
use ropes internally in 3.3. Writing to one of those and then calling
getvalue() at the end is the main alternative to the list+join trick
(when concatenating many small strings, the memory saving relative to
a list can be notable since strings have a fairly large per-instance
overhead).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com