[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Fri, Feb 21, 2020 at 02:42:16PM +0200, Serhiy Storchaka wrote: > 21.02.20 10:36, Steven D'Aprano пише: > >On my machine, at least, constructing a bytes object first followed by > >an array is significantly faster than the alternative: > > > >[steve@ando cpython]$ ./python -m timeit -s "from array import array" > >"array('i', bytes(50))" > >100 loops, best of 5: 1.71 msec per loop > > > >[steve@ando cpython]$ ./python -m timeit -s "from array import array" > >"array('i', [0])*50" > >50 loops, best of 5: 7.48 msec per loop > > > >That surprises me and I cannot explain it. > > The second one allocates and copies 4 times more memory. I completely misunderstood what the first would do. I expected it to create an array of 500,000 zeroes, but it only created an array of 125,000. That's nuts! The docstring says: Return a new array whose items are restricted by typecode, and initialized from the optional initializer value, which must be a list, string or iterable over elements of the appropriate type. A bytes object is an iterable over integers, so I expected these two to be equivalent: array('i', bytes(50)) array('i', [0]*50) I never would have predicted that a bytes iterable and a list iterable behave differently. Oh, you have to read the documentation on the website: https://docs.python.org/3/library/array.html#array.array Okay, let's try that again: [steve@ando cpython]$ ./python -m timeit -s "from array import array" "array('i', bytes(50*4))" 20 loops, best of 5: 12.6 msec per loop compared to 7.65 milliseconds for the version using multiplication. That makes more sense to me. Okay, I'm starting to come around to giving array an alternate constructor: array.zeroes(typecode, size [, *, value=None]) If keyword-only argument value is given and is not None, it is used as the initial value instead of zero. -- Steven ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/P4BY6PTQUWWY634262MHMZGOY4OBRGES/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Fri, Feb 21, 2020 at 12:43 AM Steven D'Aprano wrote: > On Thu, Feb 20, 2020 at 02:19:13PM -0800, Stephan Hoyer wrote: > > > > > Strong +1 for an array.zeros() constructor, and/or a lower level > > > array.empty() which doesn't pre-fill values. > > > > > > So it'd be a shorthand for something like this? > > > > > > >>> array.array("i", bytes(64)) > > > array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) > > > > > > It'd be convenient to specify the size as a number of array elements > > > rather than bytes. But I'm not a heavy user of array.array() so I > > > won't say either way as to whether this is needed. > > > > > > Yes, exactly. > > > > The main problem with array.array("i", bytes(64)) is that memory gets > > allocated twice, first to create the bytes() object and then to make the > > array(). This makes it unsuitable for high performance applications. > > Got some actual measurements to demonstrate that initialising the array > is a bottleneck? Especially for something as small as 64, it seems > unlikely. If it were 64MB, that might be another story. > That's right, the real use-case is quickly deserializing large amounts of data (e.g., 100s of MB) from a wire format into a form suitable for fast analysis with NumPy or pandas. Unfortunately I can't share an actual code example, but this is a pretty common scenario in the data processing world, e.g., reminiscent of the use-cases for PEP 574 ( https://www.python.org/dev/peps/pep-0574/). The concern is not just speed (which I agree is probably not impacted too poorly by an extra copy) but also memory overhead. If the resulting array is 500 MB and deserialization can be done in a streaming fashion, I don't want to wastefully allocate another 500 MB just to do a memory copy. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HCXGQEOG5HWTQLY6T6A7JTKBJJXN2AZE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
21.02.20 10:36, Steven D'Aprano пише: On my machine, at least, constructing a bytes object first followed by an array is significantly faster than the alternative: [steve@ando cpython]$ ./python -m timeit -s "from array import array" "array('i', bytes(50))" 100 loops, best of 5: 1.71 msec per loop [steve@ando cpython]$ ./python -m timeit -s "from array import array" "array('i', [0])*50" 50 loops, best of 5: 7.48 msec per loop That surprises me and I cannot explain it. The second one allocates and copies 4 times more memory. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/STQPZQEI6TMJAIZZXZ3PZPQJQ3OJHIT4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Feb 21, 2020, at 00:46, Steven D'Aprano wrote: > > > Got some actual measurements to demonstrate that initialising the array > is a bottleneck? Especially for something as small as 64, it seems > unlikely. If it were 64MB, that might be another story. > > What's wrong with `array.array("i", [0])*64` or equivalent? > > On my machine, at least, constructing a bytes object first followed by > an array is significantly faster than the alternative: > > [steve@ando cpython]$ ./python -m timeit -s "from array import array" > "array('i', bytes(50))" > 100 loops, best of 5: 1.71 msec per loop > > [steve@ando cpython]$ ./python -m timeit -s "from array import array" > "array('i', [0])*50" > 50 loops, best of 5: 7.48 msec per loop > > That surprises me and I cannot explain it. Without reading the code, I can guess. The first one does two 500K allocations and a 500K memcpy; the second only does one 500K allocation but does 150K separate 4-byte copies, and the added cost of that loop and of not moving as many bytes at a time as possible is higher than the savings of a 500K allocation. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NIWFO5ARSQZWX6KOLAAXOWOKIMF7W3O3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Thu, Feb 20, 2020 at 02:19:13PM -0800, Stephan Hoyer wrote: > > > Strong +1 for an array.zeros() constructor, and/or a lower level > > array.empty() which doesn't pre-fill values. > > > > So it'd be a shorthand for something like this? > > > > >>> array.array("i", bytes(64)) > > array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) > > > > It'd be convenient to specify the size as a number of array elements > > rather than bytes. But I'm not a heavy user of array.array() so I > > won't say either way as to whether this is needed. > > > Yes, exactly. > > The main problem with array.array("i", bytes(64)) is that memory gets > allocated twice, first to create the bytes() object and then to make the > array(). This makes it unsuitable for high performance applications. Got some actual measurements to demonstrate that initialising the array is a bottleneck? Especially for something as small as 64, it seems unlikely. If it were 64MB, that might be another story. What's wrong with `array.array("i", [0])*64` or equivalent? On my machine, at least, constructing a bytes object first followed by an array is significantly faster than the alternative: [steve@ando cpython]$ ./python -m timeit -s "from array import array" "array('i', bytes(50))" 100 loops, best of 5: 1.71 msec per loop [steve@ando cpython]$ ./python -m timeit -s "from array import array" "array('i', [0])*50" 50 loops, best of 5: 7.48 msec per loop That surprises me and I cannot explain it. -- Steven ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OFUQN3H4D5UEFNEFHUZSKIB7UR3X2N65/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Thu, Feb 20, 2020 at 12:43 PM Steve Jorgensen wrote: > > > But frankly, it would be a rare case where this would be noticeable. > > -CHB > > Maybe uncommon, but I don't know about rare. Let's say you want to perform > list-wise computations, making new lists with results of operations on > existing lists (similar to numpy, but maybe trying to do something numpy is > unsuitable for)? You would want to pre-allocate the new array to the size > of the operand arrays. Not rate that you’d have a use case, but rate that the performance would be in issue. In past experiments, I’ve found the array re-allocation scheme is remarkably performant. On the other hand, all the methods suggested in this thread require at least a double allocation— which may not be noticeable in many applications, but it’s also a fairly light lift to make a single constructer for a pre-allocated array. And as Stephan pointed out — it would help in some high performance situations. One thing to keep In mind is that array.array is useful for use from C/Cython, when you don’t want the overhead of numpy. -CHB > ___ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/2A2LOYFPBGGMOJGRLPTSLU6MSBWJPKV4/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WR6HJ2C6BQSB22KLHX5DHTBYC4AAG5XS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Thu, Feb 20, 2020 at 2:11 PM Chris Angelico wrote: > On Fri, Feb 21, 2020 at 8:52 AM Stephan Hoyer wrote: > > > > On Thu, Feb 20, 2020 at 12:41 PM Steve Jorgensen > wrote: > >> > >> Christopher Barker wrote: > >> ... > >> > > Perhaps the OP wanted the internal array size initialized, but not > used. > >> > Currently the internal array will automatically be reallocated to > grow as > >> > needed. Which could be a performance hit if you know it’s going to > grow > >> > large. > >> > But frankly, it would be a rare case where this would be noticeable. > >> > -CHB > >> > >> Maybe uncommon, but I don't know about rare. Let's say you want to > perform list-wise computations, making new lists with results of operations > on existing lists (similar to numpy, but maybe trying to do something numpy > is unsuitable for)? You would want to pre-allocate the new array to the > size of the operand arrays. > > > > > > Strong +1 for an array.zeros() constructor, and/or a lower level > array.empty() which doesn't pre-fill values. > > > > So it'd be a shorthand for something like this? > > >>> array.array("i", bytes(64)) > array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) > > It'd be convenient to specify the size as a number of array elements > rather than bytes. But I'm not a heavy user of array.array() so I > won't say either way as to whether this is needed. Yes, exactly. The main problem with array.array("i", bytes(64)) is that memory gets allocated twice, first to create the bytes() object and then to make the array(). This makes it unsuitable for high performance applications. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3PTSYLNOQ3I3FLNDS7MOJVCN6SFN2QGE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Fri, Feb 21, 2020 at 8:52 AM Stephan Hoyer wrote: > > On Thu, Feb 20, 2020 at 12:41 PM Steve Jorgensen wrote: >> >> Christopher Barker wrote: >> ... >> > > Perhaps the OP wanted the internal array size initialized, but not used. >> > Currently the internal array will automatically be reallocated to grow as >> > needed. Which could be a performance hit if you know it’s going to grow >> > large. >> > But frankly, it would be a rare case where this would be noticeable. >> > -CHB >> >> Maybe uncommon, but I don't know about rare. Let's say you want to perform >> list-wise computations, making new lists with results of operations on >> existing lists (similar to numpy, but maybe trying to do something numpy is >> unsuitable for)? You would want to pre-allocate the new array to the size of >> the operand arrays. > > > Strong +1 for an array.zeros() constructor, and/or a lower level > array.empty() which doesn't pre-fill values. > So it'd be a shorthand for something like this? >>> array.array("i", bytes(64)) array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) It'd be convenient to specify the size as a number of array elements rather than bytes. But I'm not a heavy user of array.array() so I won't say either way as to whether this is needed. ChrisA ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/L3SZD4UIKS2BTBG5VLPV3QGI6PLW7EKI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Thu, Feb 20, 2020 at 12:41 PM Steve Jorgensen wrote: > Christopher Barker wrote: > ... > > > Perhaps the OP wanted the internal array size initialized, but not > used. > > Currently the internal array will automatically be reallocated to grow as > > needed. Which could be a performance hit if you know it’s going to grow > > large. > > But frankly, it would be a rare case where this would be noticeable. > > -CHB > > Maybe uncommon, but I don't know about rare. Let's say you want to perform > list-wise computations, making new lists with results of operations on > existing lists (similar to numpy, but maybe trying to do something numpy is > unsuitable for)? You would want to pre-allocate the new array to the size > of the operand arrays. > Strong +1 for an array.zeros() constructor, and/or a lower level array.empty() which doesn't pre-fill values. A use case that came up for me recently is efficiently allocating and filling an object that satisfies the buffer protocol from C/C++ without requiring a NumPy dependency. As far as I can tell, there is no easy way to do this currently. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CJMX4EVEBLC23SVVVRSQHFZW7QYDBDRQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
Christopher Barker wrote: ... > > Perhaps the OP wanted the internal array size initialized, but not used. > Currently the internal array will automatically be reallocated to grow as > needed. Which could be a performance hit if you know it’s going to grow > large. > But frankly, it would be a rare case where this would be noticeable. > -CHB Maybe uncommon, but I don't know about rare. Let's say you want to perform list-wise computations, making new lists with results of operations on existing lists (similar to numpy, but maybe trying to do something numpy is unsuitable for)? You would want to pre-allocate the new array to the size of the operand arrays. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2A2LOYFPBGGMOJGRLPTSLU6MSBWJPKV4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
On Thu, Feb 20, 2020 at 7:34 AM Serhiy Storchaka wrote: > 20.07.11 23:48, Sven Rahmann пише: > > What's missing is the possiblity to specify the final size of the > > array (number of items), especially for large arrays. > > array.array(typecode, [fillvalue]) * n Perhaps the OP wanted the internal array size initialized, but not used. Currently the internal array will automatically be reallocated to grow as needed. Which could be a performance hit if you know it’s going to grow large. But frankly, it would be a rare case where this would be noticeable. -CHB ___ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/56YLYLTWBOPEJ2GD4VFSTDZXR27Y6K4E/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G4P7TWB4BZNOGZ6RDW2UJXSQHNZNCD36/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
20.07.11 23:48, Sven Rahmann пише: At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. array.array(typecode, [fillvalue]) * n ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/56YLYLTWBOPEJ2GD4VFSTDZXR27Y6K4E/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Specify number of items to allocate for array.array() constructor
I discovered that same trick. It would be nice to have that specifically indicated in the documentation until/unless a length argument is added to the constructor. It would be nice for the supported operators to be documented at all, actually. I didn't realize that array.array had multiplication operator support at all until I got around to dir-ing an instance. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/P74LYDOFELXWCKLLI34YEZIZLKSLBF7A/ Code of Conduct: http://python.org/psf/codeofconduct/