New submission from Kristján Valur Jónsson <krist...@ccpgames.com>:

In a recent email exchange on python-dev, Antoine Pitrou mentioned that slicing 
memoryview objects (lazy slices) wasn't necessarily very efficient when dealing 
with short slices.  The data he posted was:


$ ./python -m timeit -s "x = b'x'*10000" "x[:100]"
10000000 loops, best of 3: 0.134 usec per loop
$ ./python -m timeit -s "x = memoryview(b'x'*10000)" "x[:100]"
10000000 loops, best of 3: 0.151 usec per loop

Actually, this is not a fair comparison.  A more realistic alternative to the 
memoryview is the bytearray, a mutable buffer.  My local tests gave these 
numbers:

python.exe -m timeit -n 10000000 -s "x = ((b'x'*10000))" "x[:100]"
10000000 loops, best of 3: 0.14 usec per loop

python.exe -m timeit -n 10000000 -s "x = (bytearray(b'x'*10000))" "x[:100]"
10000000 loops, best of 3: 0.215 usec per loop

python.exe -m timeit -n 10000000 -s "x = memoryview(bytearray(b'x'*10000))" 
"x[:100]"
10000000 loops, best of 3: 0.163 usec per loop

In this case, lazy slicing is indeed faster than greedy slicing.  However, I 
was intrigued by how much these cases differ.  Why was slicing bytes objects so 
much faster?  Each should just result in the generation of a single object.

It turns out that the slicing operation for strings (and sequences is very 
streamlined in the core.  To address this to some extent I provide a patch with 
three main components:

1) There is now a single object cache of slice objects.  These are generated by 
the core when slicing and immediately released.  Reusing them if possible is 
very beneficial.
2) The PySlice_GetIndicesEx couldn't be optimized because of aliasing.  Fixing 
that function sped it up considerably.
3) Creating a new api to create a memory view from a base memory view and a 
slice is much faster.  The old way would do two copies of a Py_buffer with 
adverse effects on cache performance.

Applying this patch provides the following figures:
python.exe -m timeit -n 10000000 -s "x = ((b'x'*10000))" "x[:100]"
10000000 loops, best of 3: 0.125 usec per loop

python.exe -m timeit -n 10000000 -s "x = (bytearray(b'x'*10000))" "x[:100]"
10000000 loops, best of 3: 0.202 usec per loop

python.exe -m timeit -n 10000000 -s "x = memoryview(bytearray(b'x'*10000))" 
"x[:100]"
10000000 loops, best of 3: 0.138 usec per loop

in memoryobject.c there was a comment stating that there should be an API for 
this.  Now there is, only internal.

----------
components: Interpreter Core
keywords: needs review, patch
messages: 119872
nosy: krisvale, pitrou
priority: normal
severity: normal
status: open
title: Improve performance of MemoryView slicing
type: performance
versions: Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10227>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to