Re: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)
On 3/6/13 7:42 PM, Kurt Smith wrote: And regarding performance, doing simple timings shows a 30%-ish slowdown for unaligned operations: In [36]: %timeit packed_arr['b']**2 100 loops, best of 3: 2.48 ms per loop In [37]: %timeit aligned_arr['b']**2 1000 loops, best of 3: 1.9 ms per loop Hmm, that clearly depends on the architecture. On my machine: In [1]: import numpy as np In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True) In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False) In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt) In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt) In [6]: baligned = aligned_arr['b'] In [7]: bpacked = packed_arr['b'] In [8]: %timeit baligned**2 1000 loops, best of 3: 1.96 ms per loop In [9]: %timeit bpacked**2 100 loops, best of 3: 7.84 ms per loop That is, the unaligned column is 4x slower (!). numexpr allows somewhat better results: In [11]: %timeit numexpr.evaluate('baligned**2') 1000 loops, best of 3: 1.13 ms per loop In [12]: %timeit numexpr.evaluate('bpacked**2') 1000 loops, best of 3: 865 us per loop Yes, in this case, the unaligned array goes faster (as much as 30%). I think the reason is that numexpr optimizes the unaligned access by doing a copy of the different chunks in internal buffers that fits in L1 cache. Apparently this is very beneficial in this case (not sure why, though). Whereas summing shows just a 10%-ish slowdown: In [38]: %timeit packed_arr['b'].sum() 1000 loops, best of 3: 1.29 ms per loop In [39]: %timeit aligned_arr['b'].sum() 1000 loops, best of 3: 1.14 ms per loop On my machine: In [14]: %timeit baligned.sum() 1000 loops, best of 3: 1.03 ms per loop In [15]: %timeit bpacked.sum() 100 loops, best of 3: 3.79 ms per loop Again, the 4x slowdown is here. Using numexpr: In [16]: %timeit numexpr.evaluate('sum(baligned)') 100 loops, best of 3: 2.16 ms per loop In [17]: %timeit numexpr.evaluate('sum(bpacked)') 100 loops, best of 3: 2.08 ms per loop Again, the unaligned case is (sligthly better). In this case numexpr is a bit slower that NumPy because sum() is not parallelized internally. Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy could help improving unaligned performance. Worth a try? -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)
On Thu, Mar 7, 2013 at 11:47 AM, Francesc Alted franc...@continuum.io wrote: On 3/6/13 7:42 PM, Kurt Smith wrote: Hmm, that clearly depends on the architecture. On my machine: ... That is, the unaligned column is 4x slower (!). numexpr allows somewhat better results: ... Yes, in this case, the unaligned array goes faster (as much as 30%). I think the reason is that numexpr optimizes the unaligned access by doing a copy of the different chunks in internal buffers that fits in L1 cache. Apparently this is very beneficial in this case (not sure why, though). On my machine: ... Again, the 4x slowdown is here. Using numexpr: ... Again, the unaligned case is (sligthly better). In this case numexpr is a bit slower that NumPy because sum() is not parallelized internally. Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy could help improving unaligned performance. Worth a try? Very interesting -- thanks for sharing. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)
On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith kwmsm...@gmail.com wrote: On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted franc...@continuum.io wrote: I would not run too much. The example above takes 9 bytes to host the structure, while a `aligned=True` will take 16 bytes. I'd rather let the default as it is, and in case performance is critical, you can always copy the unaligned field to a new (homogeneous) array. Yes, I can absolutely see the case you're making here, and I made my vote with the understanding that `aligned=False` will almost certainly stay the default. Adding 'aligned=True' is simple for me to do, so no harm done. My case is based on what's the least surprising behavior: C structs / all C compilers, the builtin `struct` module, and ctypes `Structure` subclasses all use padding to ensure aligned fields by default. You can turn this off to get packed structures, but the default behavior in these other places is alignment, which is why I was surprised when I first saw that NumPy structured dtypes are packed by default. Some surprises with aligned / unaligned arrays: #- import numpy as np packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False) aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True) packed_arr = np.ones((10**6,), dtype=packed_dt) aligned_arr = np.ones((10**6,), dtype=aligned_dt) print all(packed_arr['a'] == aligned_arr['a']), np.all(packed_arr['a'] == aligned_arr['a']) # True print all(packed_arr['b'] == aligned_arr['b']), np.all(packed_arr['b'] == aligned_arr['b']) # True print all(packed_arr == aligned_arr), np.all(packed_arr == aligned_arr) # False (!!) #- I can understand what's likely going on under the covers that makes these arrays not compare equal, but I'd expect that if all columns of two structured arrays are everywhere equal, then the arrays themselves would be everywhere equal. Bug? And regarding performance, doing simple timings shows a 30%-ish slowdown for unaligned operations: In [36]: %timeit packed_arr['b']**2 100 loops, best of 3: 2.48 ms per loop In [37]: %timeit aligned_arr['b']**2 1000 loops, best of 3: 1.9 ms per loop Whereas summing shows just a 10%-ish slowdown: In [38]: %timeit packed_arr['b'].sum() 1000 loops, best of 3: 1.29 ms per loop In [39]: %timeit aligned_arr['b'].sum() 1000 loops, best of 3: 1.14 ms per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)
On Wed, 2013-03-06 at 12:42 -0600, Kurt Smith wrote: On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith kwmsm...@gmail.com wrote: On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted franc...@continuum.io wrote: I would not run too much. The example above takes 9 bytes to host the structure, while a `aligned=True` will take 16 bytes. I'd rather let the default as it is, and in case performance is critical, you can always copy the unaligned field to a new (homogeneous) array. Yes, I can absolutely see the case you're making here, and I made my vote with the understanding that `aligned=False` will almost certainly stay the default. Adding 'aligned=True' is simple for me to do, so no harm done. My case is based on what's the least surprising behavior: C structs / all C compilers, the builtin `struct` module, and ctypes `Structure` subclasses all use padding to ensure aligned fields by default. You can turn this off to get packed structures, but the default behavior in these other places is alignment, which is why I was surprised when I first saw that NumPy structured dtypes are packed by default. Some surprises with aligned / unaligned arrays: #- import numpy as np packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False) aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True) packed_arr = np.ones((10**6,), dtype=packed_dt) aligned_arr = np.ones((10**6,), dtype=aligned_dt) print all(packed_arr['a'] == aligned_arr['a']), np.all(packed_arr['a'] == aligned_arr['a']) # True print all(packed_arr['b'] == aligned_arr['b']), np.all(packed_arr['b'] == aligned_arr['b']) # True print all(packed_arr == aligned_arr), np.all(packed_arr == aligned_arr) # False (!!) #- I can understand what's likely going on under the covers that makes these arrays not compare equal, but I'd expect that if all columns of two structured arrays are everywhere equal, then the arrays themselves would be everywhere equal. Bug? Yes and no... equal for structured types seems not implemented, you get the same (wrong) False also with (packed_arr == packed_arr). But if the types are equivalent but np.equal not implemented, just returning False is a bit dangerous I agree. Not sure what the solution is exactly, I think the == operator could really raise an error instead of eating them all though probably... - Sebastian And regarding performance, doing simple timings shows a 30%-ish slowdown for unaligned operations: In [36]: %timeit packed_arr['b']**2 100 loops, best of 3: 2.48 ms per loop In [37]: %timeit aligned_arr['b']**2 1000 loops, best of 3: 1.9 ms per loop Whereas summing shows just a 10%-ish slowdown: In [38]: %timeit packed_arr['b'].sum() 1000 loops, best of 3: 1.29 ms per loop In [39]: %timeit aligned_arr['b'].sum() 1000 loops, best of 3: 1.14 ms per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion