[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-21 Thread Steven D'Aprano
On Fri, Feb 21, 2020 at 02:42:16PM +0200, Serhiy Storchaka wrote:
> 21.02.20 10:36, Steven D'Aprano пише:
> >On my machine, at least, constructing a bytes object first followed by
> >an array is significantly faster than the alternative:
> >
> >[steve@ando cpython]$ ./python -m timeit -s "from array import array"
> >"array('i', bytes(50))"
> >100 loops, best of 5: 1.71 msec per loop
> >
> >[steve@ando cpython]$ ./python -m timeit -s "from array import array"
> >"array('i', [0])*50"
> >50 loops, best of 5: 7.48 msec per loop
> >
> >That surprises me and I cannot explain it.
> 
> The second one allocates and copies 4 times more memory.

I completely misunderstood what the first would do. I expected it to 
create an array of 500,000 zeroes, but it only created an array of 
125,000. That's nuts! The docstring says:

Return a new array whose items are restricted by typecode, and
initialized from the optional initializer value, which must be a
list,  string or iterable over elements of the appropriate type.

A bytes object is an iterable over integers, so I expected these two to 
be equivalent:

array('i', bytes(50))
array('i', [0]*50)

I never would have predicted that a bytes iterable and a list iterable 
behave differently. Oh, you have to read the documentation on the 
website:

https://docs.python.org/3/library/array.html#array.array

Okay, let's try that again:


[steve@ando cpython]$ ./python -m timeit -s "from array import array" 
"array('i', bytes(50*4))"
20 loops, best of 5: 12.6 msec per loop

compared to 7.65 milliseconds for the version using multiplication. That 
makes more sense to me.

Okay, I'm starting to come around to giving array an alternate 
constructor:

array.zeroes(typecode, size [, *, value=None])

If keyword-only argument value is given and is not None, it is used as 
the initial value instead of zero.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P4BY6PTQUWWY634262MHMZGOY4OBRGES/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-21 Thread Stephan Hoyer
On Fri, Feb 21, 2020 at 12:43 AM Steven D'Aprano 
wrote:

> On Thu, Feb 20, 2020 at 02:19:13PM -0800, Stephan Hoyer wrote:
>
> > > > Strong +1 for an array.zeros() constructor, and/or a lower level
> > > array.empty() which doesn't pre-fill values.
> > >
> > > So it'd be a shorthand for something like this?
> > >
> > > >>> array.array("i", bytes(64))
> > > array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
> > >
> > > It'd be convenient to specify the size as a number of array elements
> > > rather than bytes. But I'm not a heavy user of array.array() so I
> > > won't say either way as to whether this is needed.
> >
> >
> > Yes, exactly.
> >
> > The main problem with array.array("i", bytes(64)) is that memory gets
> > allocated twice, first to create the bytes() object and then to make the
> > array(). This makes it unsuitable for high performance applications.
>
> Got some actual measurements to demonstrate that initialising the array
> is a bottleneck? Especially for something as small as 64, it seems
> unlikely. If it were 64MB, that might be another story.
>

That's right, the real use-case is quickly deserializing large amounts of
data (e.g., 100s of MB) from a wire format into a form suitable for fast
analysis with NumPy or pandas. Unfortunately I can't share an actual code
example, but this is a pretty common scenario in the data processing world,
e.g., reminiscent of the use-cases for PEP 574 (
https://www.python.org/dev/peps/pep-0574/).

The concern is not just speed (which I agree is probably not impacted too
poorly by an extra copy) but also memory overhead. If the resulting array
is 500 MB and deserialization can be done in a streaming fashion, I don't
want to wastefully allocate another 500 MB just to do a memory copy.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HCXGQEOG5HWTQLY6T6A7JTKBJJXN2AZE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-21 Thread Serhiy Storchaka

21.02.20 10:36, Steven D'Aprano пише:

On my machine, at least, constructing a bytes object first followed by
an array is significantly faster than the alternative:

[steve@ando cpython]$ ./python -m timeit -s "from array import array"
"array('i', bytes(50))"
100 loops, best of 5: 1.71 msec per loop

[steve@ando cpython]$ ./python -m timeit -s "from array import array"
"array('i', [0])*50"
50 loops, best of 5: 7.48 msec per loop

That surprises me and I cannot explain it.


The second one allocates and copies 4 times more memory.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/STQPZQEI6TMJAIZZXZ3PZPQJQ3OJHIT4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-21 Thread Andrew Barnert via Python-ideas
On Feb 21, 2020, at 00:46, Steven D'Aprano  wrote:
> 
> 
> Got some actual measurements to demonstrate that initialising the array 
> is a bottleneck? Especially for something as small as 64, it seems 
> unlikely. If it were 64MB, that might be another story.
> 
> What's wrong with `array.array("i", [0])*64` or equivalent?
> 
> On my machine, at least, constructing a bytes object first followed by 
> an array is significantly faster than the alternative:
> 
> [steve@ando cpython]$ ./python -m timeit -s "from array import array" 
> "array('i', bytes(50))"
> 100 loops, best of 5: 1.71 msec per loop
> 
> [steve@ando cpython]$ ./python -m timeit -s "from array import array" 
> "array('i', [0])*50"
> 50 loops, best of 5: 7.48 msec per loop
> 
> That surprises me and I cannot explain it.

Without reading the code, I can guess. The first one does two 500K allocations 
and a 500K memcpy; the second only does one 500K allocation but does 150K 
separate 4-byte copies, and the added cost of that loop and of not moving as 
many bytes at a time as possible is higher than the savings of a 500K 
allocation.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NIWFO5ARSQZWX6KOLAAXOWOKIMF7W3O3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-21 Thread Steven D'Aprano
On Thu, Feb 20, 2020 at 02:19:13PM -0800, Stephan Hoyer wrote:

> > > Strong +1 for an array.zeros() constructor, and/or a lower level
> > array.empty() which doesn't pre-fill values.
> >
> > So it'd be a shorthand for something like this?
> >
> > >>> array.array("i", bytes(64))
> > array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
> >
> > It'd be convenient to specify the size as a number of array elements
> > rather than bytes. But I'm not a heavy user of array.array() so I
> > won't say either way as to whether this is needed.
> 
> 
> Yes, exactly.
> 
> The main problem with array.array("i", bytes(64)) is that memory gets
> allocated twice, first to create the bytes() object and then to make the
> array(). This makes it unsuitable for high performance applications.

Got some actual measurements to demonstrate that initialising the array 
is a bottleneck? Especially for something as small as 64, it seems 
unlikely. If it were 64MB, that might be another story.

What's wrong with `array.array("i", [0])*64` or equivalent?

On my machine, at least, constructing a bytes object first followed by 
an array is significantly faster than the alternative:

[steve@ando cpython]$ ./python -m timeit -s "from array import array" 
"array('i', bytes(50))"
100 loops, best of 5: 1.71 msec per loop

[steve@ando cpython]$ ./python -m timeit -s "from array import array" 
"array('i', [0])*50"
50 loops, best of 5: 7.48 msec per loop

That surprises me and I cannot explain it.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OFUQN3H4D5UEFNEFHUZSKIB7UR3X2N65/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Christopher Barker
On Thu, Feb 20, 2020 at 12:43 PM Steve Jorgensen  wrote:

>
> > But frankly, it would be a rare case where this would be noticeable.
> > -CHB
>
> Maybe uncommon, but I don't know about rare. Let's say you want to perform
> list-wise computations, making new lists with results of operations on
> existing lists (similar to numpy, but maybe trying to do something numpy is
> unsuitable for)? You would want to pre-allocate the new array to the size
> of the operand arrays.


Not rate that you’d have a use case, but rate that the performance would be
in issue. In past experiments, I’ve found the array re-allocation scheme is
remarkably performant.

On the other hand, all the methods suggested in this thread require at
least a double allocation— which may not be noticeable in many
applications, but it’s also a fairly light lift to make a
single  constructer for a pre-allocated array.

And as Stephan pointed out — it would help in some high performance
situations.

One thing to keep In mind is that array.array is useful for use from
C/Cython, when you don’t want the overhead of numpy.

-CHB

> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/2A2LOYFPBGGMOJGRLPTSLU6MSBWJPKV4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WR6HJ2C6BQSB22KLHX5DHTBYC4AAG5XS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Stephan Hoyer
On Thu, Feb 20, 2020 at 2:11 PM Chris Angelico  wrote:

> On Fri, Feb 21, 2020 at 8:52 AM Stephan Hoyer  wrote:
> >
> > On Thu, Feb 20, 2020 at 12:41 PM Steve Jorgensen 
> wrote:
> >>
> >> Christopher Barker wrote:
> >> ...
> >> > > Perhaps the OP wanted the internal array size initialized, but not
> used.
> >> > Currently the internal array will automatically be reallocated to
> grow as
> >> > needed. Which could be a performance hit if you know it’s going to
> grow
> >> > large.
> >> > But frankly, it would be a rare case where this would be noticeable.
> >> > -CHB
> >>
> >> Maybe uncommon, but I don't know about rare. Let's say you want to
> perform list-wise computations, making new lists with results of operations
> on existing lists (similar to numpy, but maybe trying to do something numpy
> is unsuitable for)? You would want to pre-allocate the new array to the
> size of the operand arrays.
> >
> >
> > Strong +1 for an array.zeros() constructor, and/or a lower level
> array.empty() which doesn't pre-fill values.
> >
>
> So it'd be a shorthand for something like this?
>
> >>> array.array("i", bytes(64))
> array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>
> It'd be convenient to specify the size as a number of array elements
> rather than bytes. But I'm not a heavy user of array.array() so I
> won't say either way as to whether this is needed.


Yes, exactly.

The main problem with array.array("i", bytes(64)) is that memory gets
allocated twice, first to create the bytes() object and then to make the
array(). This makes it unsuitable for high performance applications.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3PTSYLNOQ3I3FLNDS7MOJVCN6SFN2QGE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Chris Angelico
On Fri, Feb 21, 2020 at 8:52 AM Stephan Hoyer  wrote:
>
> On Thu, Feb 20, 2020 at 12:41 PM Steve Jorgensen  wrote:
>>
>> Christopher Barker wrote:
>> ...
>> > > Perhaps the OP wanted the internal array size initialized, but not used.
>> > Currently the internal array will automatically be reallocated to grow as
>> > needed. Which could be a performance hit if you know it’s going to grow
>> > large.
>> > But frankly, it would be a rare case where this would be noticeable.
>> > -CHB
>>
>> Maybe uncommon, but I don't know about rare. Let's say you want to perform 
>> list-wise computations, making new lists with results of operations on 
>> existing lists (similar to numpy, but maybe trying to do something numpy is 
>> unsuitable for)? You would want to pre-allocate the new array to the size of 
>> the operand arrays.
>
>
> Strong +1 for an array.zeros() constructor, and/or a lower level 
> array.empty() which doesn't pre-fill values.
>

So it'd be a shorthand for something like this?

>>> array.array("i", bytes(64))
array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

It'd be convenient to specify the size as a number of array elements
rather than bytes. But I'm not a heavy user of array.array() so I
won't say either way as to whether this is needed.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/L3SZD4UIKS2BTBG5VLPV3QGI6PLW7EKI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Stephan Hoyer
On Thu, Feb 20, 2020 at 12:41 PM Steve Jorgensen  wrote:

> Christopher Barker wrote:
> ...
> > > Perhaps the OP wanted the internal array size initialized, but not
> used.
> > Currently the internal array will automatically be reallocated to grow as
> > needed. Which could be a performance hit if you know it’s going to grow
> > large.
> > But frankly, it would be a rare case where this would be noticeable.
> > -CHB
>
> Maybe uncommon, but I don't know about rare. Let's say you want to perform
> list-wise computations, making new lists with results of operations on
> existing lists (similar to numpy, but maybe trying to do something numpy is
> unsuitable for)? You would want to pre-allocate the new array to the size
> of the operand arrays.
>

Strong +1 for an array.zeros() constructor, and/or a lower level
array.empty() which doesn't pre-fill values.

A use case that came up for me recently is efficiently allocating and
filling an object that satisfies the buffer protocol from C/C++ without
requiring a NumPy dependency. As far as I can tell, there is no easy way to
do this currently.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CJMX4EVEBLC23SVVVRSQHFZW7QYDBDRQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Steve Jorgensen
Christopher Barker wrote:
...
> > Perhaps the OP wanted the internal array size initialized, but not used.
> Currently the internal array will automatically be reallocated to grow as
> needed. Which could be a performance hit if you know it’s going to grow
> large.
> But frankly, it would be a rare case where this would be noticeable.
> -CHB

Maybe uncommon, but I don't know about rare. Let's say you want to perform 
list-wise computations, making new lists with results of operations on existing 
lists (similar to numpy, but maybe trying to do something numpy is unsuitable 
for)? You would want to pre-allocate the new array to the size of the operand 
arrays.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2A2LOYFPBGGMOJGRLPTSLU6MSBWJPKV4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Christopher Barker
On Thu, Feb 20, 2020 at 7:34 AM Serhiy Storchaka 
wrote:

> 20.07.11 23:48, Sven Rahmann пише:
> > What's missing is the possiblity to specify the final size of the
> > array (number of items), especially for large arrays.
>
>  array.array(typecode, [fillvalue]) * n


Perhaps the OP wanted the internal array size initialized, but not used.
Currently the internal array will automatically be reallocated to grow as
needed. Which could be a performance hit if you know it’s going to grow
large.

But frankly, it would be a rare case where this would be noticeable.

-CHB


___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/56YLYLTWBOPEJ2GD4VFSTDZXR27Y6K4E/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G4P7TWB4BZNOGZ6RDW2UJXSQHNZNCD36/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Serhiy Storchaka

20.07.11 23:48, Sven Rahmann пише:

At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.


array.array(typecode, [fillvalue]) * n
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/56YLYLTWBOPEJ2GD4VFSTDZXR27Y6K4E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Specify number of items to allocate for array.array() constructor

2020-02-20 Thread Steve Jorgensen
I discovered that same trick. It would be nice to have that specifically 
indicated in the documentation until/unless a length argument is added to the 
constructor.

It would be nice for the supported operators to be documented at all, actually. 
I didn't realize that array.array had multiplication operator support at all 
until I got around to dir-ing an instance.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P74LYDOFELXWCKLLI34YEZIZLKSLBF7A/
Code of Conduct: http://python.org/psf/codeofconduct/