Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-30 Thread Serhiy Storchaka
On 30.05.12 14:26, Victor Stinner wrote: I implemented something like that, and it was not efficient and very complex. See for example the (incomplete) patch for str%args attached to the issue #14687: http://bugs.python.org/file25413/pyunicode_format-2.patch I have seen and commented on this p

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-30 Thread Victor Stinner
>> The "two steps" method is not promising: parsing the format string >> twice is slower than other methods. > > The "1.5 steps" method is more promising -- first parse the format string in > an efficient internal representation, and then allocate the output string > and then write characters (or e

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-30 Thread Serhiy Storchaka
On 30.05.12 01:44, Victor Stinner wrote: The "two steps" method is not promising: parsing the format string twice is slower than other methods. The "1.5 steps" method is more promising -- first parse the format string in an efficient internal representation, and then allocate the output strin

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-29 Thread Glenn Linderman
On 5/29/2012 3:51 PM, Nick Coghlan wrote: On Wed, May 30, 2012 at 8:44 AM, Victor Stinner wrote: I also compared str%args and str.format() with Python 2.7 (byte strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-29 Thread Nick Coghlan
On Wed, May 30, 2012 at 8:44 AM, Victor Stinner wrote: > I also compared str%args and str.format() with Python 2.7 (byte > strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as > fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is 10 to > 30% slower than Python 2 in gene

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-29 Thread Victor Stinner
Hi, >  * Use a Py_UCS4 buffer and then convert to the canonical form (ASCII, > UCS1 or UCS2). Approach taken by io.StringIO. io.StringIO is not only > used to write, but also to read and so a Py_UCS4 buffer is a good > compromise. >  * PyAccu API: optimized version of chunks=[]; for ...: ... > chu

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-04 Thread Serhiy Storchaka
04.05.12 02:45, Victor Stinner написав(ла): * Two steps: compute the length and maximum character of the output string, allocate the output string and then write characters. str%args was using it. * Optimistic approach. Start with a ASCII buffer, enlarge and widen (to UCS2 and then UCS4) the

Re: [Python-Dev] Optimize Unicode strings in Python 3.3

2012-05-03 Thread martin
Various notes: * PyUnicode_READ() is slower than reading a Py_UNICODE array. * Some decoders unroll the main loop to process 4 or 8 bytes (32 or 64 bits CPU) at each step. I am interested if you know other tricks to optimize Unicode strings in Python, or if you are interested to work on this to