[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-09 Thread STINNER Victor

STINNER Victor added the comment:

Buildbots still like this new API :-) (no test failure recently)

I reworked the API a little bit to make its usage simpler in Unicode encoders. 
I started to open new issues to using this new API in more functions producing 
byte strings.

I consider that this issue can now be closed. I'm happy, the API looks good to 
me and the modified code is faster.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread STINNER Victor

STINNER Victor added the comment:

Oh, I was surprised to see same or worse performances for 
UTF-8/backslashreplace. In fact, I forgot to enable overallocation. With 
overallocation, it is now faster ;-)

I modified the API to put the "stack buffer" inside _PyBytesWriter API 
directly. I also reworked _PyBytesWriter_Alloc() to call  
_PyBytesWriter_Prepare() so _PyBytesWriter_Alloc() now supports overallocation 
as well. It was part of _PyBytesWriter design to support overallocation at the 
first allocation (_PyBytesWriter_Alloc), that's why we have 
_PyBytesWriter_Alloc() *and* _PyBytesWriter_Init(): it's possible to set 
overallocate=1 between init and alloc.

I pushed my change since it didn't kill performances. It's only a little bit 
smaller but on very short encode: less than 500 ns. In other cases, it's the 
same performances or faster.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 59f4806a5add by Victor Stinner in branch 'default':
Optimize backslashreplace error handler
https://hg.python.org/cpython/rev/59f4806a5add

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1a2175149c5e by Victor Stinner in branch 'default':
Issue #25318: Add _PyBytesWriter API
https://hg.python.org/cpython/rev/1a2175149c5e

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset e9c1404d6bd9 by Victor Stinner in branch 'default':
Issue #25318: Fix compilation error
https://hg.python.org/cpython/rev/e9c1404d6bd9

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 9cf89366bbcb by Victor Stinner in branch 'default':
Issue #25318: Avoid sprintf() in backslashreplace()
https://hg.python.org/cpython/rev/9cf89366bbcb

New changeset 0a522f68d275 by Victor Stinner in branch 'default':
Issue #25318: Fix backslashreplace()
https://hg.python.org/cpython/rev/0a522f68d275

New changeset c53dcf1d6967 by Victor Stinner in branch 'default':
Issue #25318: cleanup code _PyBytesWriter
https://hg.python.org/cpython/rev/c53dcf1d6967

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread Roundup Robot

Roundup Robot added the comment:

New changeset c134eddcb347 by Victor Stinner in branch 'default':
Issue #25318: Move _PyBytesWriter to bytesobject.c
https://hg.python.org/cpython/rev/c134eddcb347

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread STINNER Victor

STINNER Victor added the comment:

I created the issue #25349 "Use _PyBytesWriter for bytes%args".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-08 Thread STINNER Victor

STINNER Victor added the comment:

The FreeBSD 9.x buildbot is grumpy.

http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.x/builds/3495/steps/test/logs/stdio

Assertion failed: (start[writer->allocated] == 0), function 
_PyBytesWriter_CheckConsistency, file Objects/bytesobject.c, line 3809.
Fatal Python error: Aborted

Current thread 0x000801807400 (most recent call first):
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/test/test_pep277.py", 
line 150 in test_listdir
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 
600 in run
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/case.py", line 
648 in __call__
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", 
line 122 in run
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", 
line 84 in __call__
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", 
line 122 in run
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/suite.py", 
line 84 in __call__
  File 
"/usr/home/buildbot/python/3.x.koobs-freebsd9/build/Lib/unittest/runner.py", 
line 176 in run
...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

STINNER Victor added the comment:

Results of bench.py attached to issue #25227 (ASCII and Latin1 encoders): 
attached bench_ucs1_result.txt file.

+-+---
Summary | ucs1_before | ucs1_after
+-+---
ascii   | 1.69 ms (*) |1.69 ms
latin1  |  1.7 ms (*) |1.69 ms
+-+---
Total   | 3.39 ms (*) |3.39 ms
+-+---

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

STINNER Victor added the comment:

Result of  bench.py attached to issue #25267: attached bench_utf8_result.txt.

--+-+---
Summary   | utf8_before | 
utf8_after
--+-+---
ignore: "\udcff" * length | 7.63 us (*) |    
7.91 us
ignore: "a" * length + "\udcff"   | 10.7 us (*) |    
10.8 us
ignore: ("a" * 99 + "\udcff" * 99) * length   | 2.17 ms (*) |    
2.16 ms
ignore: ("\udcff" * 99 + "a") * length    |  843 us (*) | 
866 us
ignore: "\udcff" + "a" * length   | 10.7 us (*) |  
11 us
replace: "\udcff" * length    | 7.87 us (*) |  8.43 us 
(+7%)
replace: "a" * length + "\udcff"  | 10.8 us (*) |    
10.9 us
replace: ("a" * 99 + "\udcff" * 99) * length  | 2.46 ms (*) |    
2.46 ms
replace: ("\udcff" * 99 + "a") * length   |  907 us (*) | 
939 us
replace: "\udcff" + "a" * length  | 10.9 us (*) |  
11 us
surrogateescape: "\udcff" * length    | 14.2 us (*) | 17.2 us 
(+21%)
surrogateescape: "a" * length + "\udcff"  | 10.6 us (*) |    
10.7 us
surrogateescape: ("a" * 99 + "\udcff" * 99) * length  | 3.19 ms (*) |   3.4 ms 
(+7%)
surrogateescape: ("\udcff" * 99 + "a") * length   | 1.64 ms (*) | 1.87 ms 
(+13%)
surrogateescape: "\udcff" + "a" * length  | 10.6 us (*) |    
10.7 us
surrogatepass: "\udcff" * length  | 23.1 us (*) |    
23.5 us
surrogatepass: "a" * length + "\udcff"    | 10.7 us (*) |    
10.8 us
surrogatepass: ("a" * 99 + "\udcff" * 99) * length    | 4.39 ms (*) |    
4.44 ms
surrogatepass: ("\udcff" * 99 + "a") * length | 2.43 ms (*) |    
2.47 ms
surrogatepass: "\udcff" + "a" * length    | 10.6 us (*) |    
10.8 us
backslashreplace: "\udcff" * length   | 65.7 us (*) |    
64.3 us
backslashreplace: "a" * length + "\udcff" | 15.7 us (*) |  
15 us
backslashreplace: ("a" * 99 + "\udcff" * 99) * length |   12 ms (*) | 15.9 ms 
(+32%)
backslashreplace: ("\udcff" * 99 + "a") * length  | 11.1 ms (*) | 13.5 ms 
(+22%)
backslashreplace: "\udcff" + "a" * length | 16.4 us (*) |  15.1 us 
(-8%)
--+-+---
Total | 41.4 ms (*) | 48.3 ms 
(+17%)
--+-+---

--
Added file: http://bugs.python.org/file40683/bench_utf8_result.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

New submission from STINNER Victor:

Attached patch is the first step to optimize Unicode encoders: it adds a 
_PyBytesWriter API. This API is responsible to use the most efficient buffer 
depending on the need:

* it's possible to use a small buffer directly allocated on the C stack
* otherwise a Python bytes object is allocated
* it's possible to overallocate the bytes objcet to reduce the number of calls 
to _PyBytes_Resize()

The patch only adds the new API, don't expect any speed up. I just added a 
small optimization: the overallocation is disabled in UCS1 encoder (ASCII and 
Latin1) for the last write.

--
components: Unicode
messages: 252322
nosy: ezio.melotti, haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Add _PyBytesWriter API to optimize Unicode encoders
type: performance
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

Changes by STINNER Victor :


--
keywords: +patch
Added file: http://bugs.python.org/file40685/bytes_writer.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

Changes by STINNER Victor :


Added file: http://bugs.python.org/file40684/bench_ucs1_result.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

STINNER Victor added the comment:

A few months ago, I wrote a previous implementation of the _PyBytesWriter API 
which embedded the "current pointer" inside _PyBytesWriter API. The problem was 
that GCC produced less efficient code than expect for the hotspot of the 
encoder.

In the new implementation (attached patch), the "current pointer" is unchanged: 
it's still a variable local to the encoder function. Instead, the current 
pointer became a *parameter* to all _PyBytesWriter *functions*.

I expect to not touch performances of encoders for valid encoded strings (when 
the code calling error handlers is not used), which is important since we have 
very good performance here.

_PyBytesWriter is not restricted to the code to allocate the buffer.

--

bytes_writer.patch:

+char stackbuf[256];

Oh, I forgot to mention this other small optimization. I also added a small 
buffer allocated on the C stack for the UCS1 encoder (ASCII, Latin1). It may 
optimize a little bit encoding when the output string is smaller than 256 bytes 
when the error handler is used.

The optimization comes from the very efficient UTF-8 encoder.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25318] Add _PyBytesWriter API to optimize Unicode encoders

2015-10-05 Thread STINNER Victor

STINNER Victor added the comment:

My previous abandonned attempt was the issue #17742.

"Add _PyBytesWriter API to optimize Unicode encoders"

Oh, I forgot to mention and it may also be used to optimize bytes % args. More 
generally, any code generating a bytes object with an unknown length is 
advance. Said differently: _PyBytesWriter can be used when precomputing the 
output length is more expensive.

str % args now uses _PyUnicodeWriter but building an Unicode string is even 
more complex because of the different Unicode "kinds": 1, 2 or 4 bytes per 
character.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com