Re: [Numpy-discussion] About the npz format
2014-04-16 20:26 GMT+02:00 onefire onefire.mys...@gmail.com: Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does not have a problem with the dependency). I am pretty happy with the npy format, but the npz format seems to be broken as far as performance is concerned (or I am missing obvious!). The following ipython session illustrates the issue: ln [1]: import numpy as np In [2]: x = np.linspace(1, 10, 5000) In [3]: %time np.save(x.npy, x) CPU times: user 40 ms, sys: 230 ms, total: 270 ms Wall time: 488 ms In [4]: %time np.savez(x.npz, data = x) CPU times: user 657 ms, sys: 707 ms, total: 1.36 s Wall time: 7.7 s Hi, In my case (python-2.7.3, numpy-1.6.1): In [23]: %time save(xx.npy, x) CPU times: user 0.00 s, sys: 0.23 s, total: 0.23 s Wall time: 4.07 s In [24]: %time savez(xx.npz, data = x) CPU times: user 0.42 s, sys: 0.61 s, total: 1.02 s Wall time: 4.26 s In my case I don't see the unbelievable amount of overhead of the npz thing. Best ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
On 17 Apr 2014 01:57, onefire onefire.mys...@gmail.com wrote: What I cannot understand is why savez takes more than 10 times longer than saving the data to a npy file. The only reason that I could come up with was the computation of the crc32. We can all make guesses but the solution is just to profile it :-). %prun in ipython (and then if you need more granularity installing line_profiler is useful). -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
On Wed, Apr 16, 2014 at 4:17 PM, R Hattersley rhatters...@gmail.com wrote: For some reason the Python issue 21223 didn't show any activity until I logged in to post my patch. At which point I saw that haypo had already submitted pretty much exactly the same patch. *sigh* That was pretty much a waste of time then. :-| Oh, that sucks :-(. I knew that there was a patch posted there, but I was travelling yesterday when you posted :-/. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ImportError: /usr/local/lib/python2.7/site-packages/numpy-1.8.0-py2.7-linux-x86_64.egg/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString
No. I didn't rebuild numpy after rebuilding python. I searched online about this error. It said that this error might be caused by building python with USC-4. Is there a way to check if the python was built with USC-4 or USC-2? Will rebuilding python with USC-2 work? I'm really reluctant to recompile the entire scipy stack. -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/ImportError-usr-local-lib-python2-7-site-packages-numpy-1-8-0-py2-7-linux-x86-64-egg-numpy-core-multg-tp37372p37383.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing unaligned memory raised system faults. The default behavior of these machines was to terminate the program if more than 1000 such errors occurred on a given process, and an environment variable allowed you to terminate the program if *any* unaligned memory access occurred. This is because unaligned memory accesses were 15x (or more) slower than aligned memory access. The newer /Q chips seem to be a little more forgiving of this, but I think one can in general expect allocated memory alignment to be an important performance technique for future high performance computing architectures. A On Thu, Apr 17, 2014 at 9:09 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 16, 2014 at 4:17 PM, R Hattersley rhatters...@gmail.com wrote: For some reason the Python issue 21223 didn't show any activity until I logged in to post my patch. At which point I saw that haypo had already submitted pretty much exactly the same patch. *sigh* That was pretty much a waste of time then. :-| Oh, that sucks :-(. I knew that there was a patch posted there, but I was travelling yesterday when you posted :-/. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
On 17 Apr 2014 15:09, Aron Ahmadia a...@ahmadia.net wrote: On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing unaligned memory raised system faults. The default behavior of these machines was to terminate the program if more than 1000 such errors occurred on a given process, and an environment variable allowed you to terminate the program if *any* unaligned memory access occurred. This is because unaligned memory accesses were 15x (or more) slower than aligned memory access. The newer /Q chips seem to be a little more forgiving of this, but I think one can in general expect allocated memory alignment to be an important performance technique for future high performance computing architectures. Right, this much is true on lots of architectures, and so malloc is careful to always return values with sufficient alignment (e.g. 8 bytes) to make sure that any standard operation can succeed. The question here is whether it will be important to have *even more* alignment than malloc gives us by default. A 16 or 32 byte wide SIMD instruction might prefer that data have 16 or 32 byte alignment, even if normal memory access for the types being operated on only requires 4 or 8 byte alignment. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
Hmnn, I wasn't being clear :) The default malloc on BlueGene/Q only returns 8 byte alignment, but the SIMD units need 32-byte alignment for loads, stores, and operations or performance suffers. On the /P the required alignment was 16-bytes, but malloc only gave you 8, and trying to perform vectorized loads/stores generated alignment exceptions on unaligned memory. See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 14 for overview, 15 for the effective performance difference between the unaligned/aligned code) for some notes on this. A On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith n...@pobox.com wrote: On 17 Apr 2014 15:09, Aron Ahmadia a...@ahmadia.net wrote: On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing unaligned memory raised system faults. The default behavior of these machines was to terminate the program if more than 1000 such errors occurred on a given process, and an environment variable allowed you to terminate the program if *any* unaligned memory access occurred. This is because unaligned memory accesses were 15x (or more) slower than aligned memory access. The newer /Q chips seem to be a little more forgiving of this, but I think one can in general expect allocated memory alignment to be an important performance technique for future high performance computing architectures. Right, this much is true on lots of architectures, and so malloc is careful to always return values with sufficient alignment (e.g. 8 bytes) to make sure that any standard operation can succeed. The question here is whether it will be important to have *even more* alignment than malloc gives us by default. A 16 or 32 byte wide SIMD instruction might prefer that data have 16 or 32 byte alignment, even if normal memory access for the types being operated on only requires 4 or 8 byte alignment. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANN: Bokeh 0.4.4 released
I am happy to announce the release of Bokeh version 0.4.4! Bokeh is a Python library for visualizing large and realtime datasets on the web. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity to thin clients. Bokeh includes its own Javascript library (BokehJS) that implements a reactive scenegraph representation of the plot, and renders efficiently to HTML5 Canvas. Bokeh works well with IPython Notebook, but can generate standalone graphics that embed into regular HTML. If you are a Matplotlib user, you can just use %bokeh magic to start interacting with your plots in the notebook immediately! Check out the full documentation, interactive gallery, and tutorial at http://bokeh.pydata.org If you are using Anaconda, you can install with conda: conda install bokeh Alternatively, you can install with pip: pip install bokeh We are still working on some bigger features but want to get new fixes and functionality out to users as soon as we can. Some notable features of this release are: * Additional Matplotlib, ggplot, and Seaborn compatibility (styling, more examples) * TravisCI testing integration at https://travis-ci.org/ContinuumIO/bokeh * Tool enhancements, constrained pan/zoom, more hover glyphs * Server remote data and downsampling examples * Initial work for Bokeh app concept Also, we've also made lots of little bug fixes and enhancements - see the CHANGELOG for full details. BokehJS is also available by CDN for use in standalone javascript applications: http://cdn.pydata.org/bokeh-0.4.4.js http://cdn.pydata.org/bokeh-0.4.4.css http://cdn.pydata.org/bokeh-0.4.4.min.js http://cdn.pydata.org/bokeh-0.4.4.min.css Some examples of BokehJS use can be found on the Bokeh JSFiddle page: http://jsfiddle.net/user/bokeh/fiddles/ The release of Bokeh 0.5 is planned for early May. Some notable features we plan to include are: * Abstract Rendering for semantically meaningful downsampling of large datasets * Better grid-based layout system, using Cassowary.js * More MPL/Seaborn/ggplot.py compatibility and examples, using MPLExporter * Additional tools, improved interactions, and better plot frame * Touch support Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh Questions can be directed to the Bokeh mailing list: bo...@continuum.io If you have interest in helping to develop Bokeh, please get involved! Special thanks to recent contributors: Amy Troschinetz and Gerald Dalley Bryan Van de Ven Continuum Analytics http://continuum.io ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
Uh, 15x slower for unaligned access is quite a lot. But Intel (and AMD) arquitectures are much more tolerant in this aspect (and improving). For example, with a Xeon(R) CPU E5-2670 (2 years old) I get: In [1]: import numpy as np In [2]: shape = (1, 1) In [3]: x_aligned = np.zeros(shape, dtype=[('x',np.float64),('y',np.int64)])['x'] In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] In [5]: %timeit res = x_aligned ** 2 1 loops, best of 3: 289 ms per loop In [6]: %timeit res = x_unaligned ** 2 1 loops, best of 3: 664 ms per loop so the added cost in this case is just a bit more than 2x. But you can also alleviate this overhead if you do a copy that fits in cache prior to do computations. numexpr does this: https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203 and the results are pretty good: In [8]: import numexpr as ne In [9]: %timeit res = ne.evaluate('x_aligned ** 2') 10 loops, best of 3: 133 ms per loop In [10]: %timeit res = ne.evaluate('x_unaligned ** 2') 10 loops, best of 3: 134 ms per loop i.e. there is not a significant difference between aligned and unaligned access to data. I wonder if the same technique could be applied to NumPy. Francesc El 17/04/14 16:26, Aron Ahmadia ha escrit: Hmnn, I wasn't being clear :) The default malloc on BlueGene/Q only returns 8 byte alignment, but the SIMD units need 32-byte alignment for loads, stores, and operations or performance suffers. On the /P the required alignment was 16-bytes, but malloc only gave you 8, and trying to perform vectorized loads/stores generated alignment exceptions on unaligned memory. See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 14 for overview, 15 for the effective performance difference between the unaligned/aligned code) for some notes on this. A On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith n...@pobox.com mailto:n...@pobox.com wrote: On 17 Apr 2014 15:09, Aron Ahmadia a...@ahmadia.net mailto:a...@ahmadia.net wrote: On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing unaligned memory raised system faults. The default behavior of these machines was to terminate the program if more than 1000 such errors occurred on a given process, and an environment variable allowed you to terminate the program if *any* unaligned memory access occurred. This is because unaligned memory accesses were 15x (or more) slower than aligned memory access. The newer /Q chips seem to be a little more forgiving of this, but I think one can in general expect allocated memory alignment to be an important performance technique for future high performance computing architectures. Right, this much is true on lots of architectures, and so malloc is careful to always return values with sufficient alignment (e.g. 8 bytes) to make sure that any standard operation can succeed. The question here is whether it will be important to have *even more* alignment than malloc gives us by default. A 16 or 32 byte wide SIMD instruction might prefer that data have 16 or 32 byte alignment, even if normal memory access for the types being operated on only requires 4 or 8 byte alignment. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] min depth to nonzero in 3d array
Given an array A of shape m x n x n (i.e., a stack of square matrices), I want an n x n array that gives the minimum depth to a nonzero element. E.g., the 0,0 element of the result is np.flatnonzero(A[:,0,0])[0] Can this be vectorized? (Assuming a nonzero element exists is ok, but dealing nicely with its absence is even better.) Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
On 17.04.2014 18:06, Francesc Alted wrote: In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] on arrays of this size you won't see alignment issues you are dominated by memory bandwidth. If at all you will only see it if the data fits into the cache. Its also about unaligned to simd vectors not unaligned to basic types. But it doesn't matter anymore on modern x86 cpus. I guess for array data cache line splits should also not be a big concern. Aligned allocators are not the only allocator which might be useful in numpy. Modern CPUs also support larger pages than 4K (huge pages up to 1GB in size) which reduces TLB cache misses. Memory of this type typically needs to be allocated with special mmap flags, though newer kernel versions can now also provide this memory to transparent anonymous pages (normal non-file mmaps). In [8]: import numexpr as ne In [9]: %timeit res = ne.evaluate('x_aligned ** 2') 10 loops, best of 3: 133 ms per loop In [10]: %timeit res = ne.evaluate('x_unaligned ** 2') 10 loops, best of 3: 134 ms per loop i.e. there is not a significant difference between aligned and unaligned access to data. I wonder if the same technique could be applied to NumPy. you already can do so with relatively simple means: http://nbviewer.ipython.org/gist/anonymous/10942132 If you change the blocking function to get a function as input and use inplace operations numpy can even beat numexpr (though I used the numexpr Ubuntu package which might not be compiled optimally) This type of transformation can probably be applied on the AST quite easily. Francesc El 17/04/14 16:26, Aron Ahmadia ha escrit: Hmnn, I wasn't being clear :) The default malloc on BlueGene/Q only returns 8 byte alignment, but the SIMD units need 32-byte alignment for loads, stores, and operations or performance suffers. On the /P the required alignment was 16-bytes, but malloc only gave you 8, and trying to perform vectorized loads/stores generated alignment exceptions on unaligned memory. See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 14 for overview, 15 for the effective performance difference between the unaligned/aligned code) for some notes on this. A On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith n...@pobox.com mailto:n...@pobox.com wrote: On 17 Apr 2014 15:09, Aron Ahmadia a...@ahmadia.net mailto:a...@ahmadia.net wrote: On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing unaligned memory raised system faults. The default behavior of these machines was to terminate the program if more than 1000 such errors occurred on a given process, and an environment variable allowed you to terminate the program if *any* unaligned memory access occurred. This is because unaligned memory accesses were 15x (or more) slower than aligned memory access. The newer /Q chips seem to be a little more forgiving of this, but I think one can in general expect allocated memory alignment to be an important performance technique for future high performance computing architectures. Right, this much is true on lots of architectures, and so malloc is careful to always return values with sufficient alignment (e.g. 8 bytes) to make sure that any standard operation can succeed. The question here is whether it will be important to have *even more* alignment than malloc gives us by default. A 16 or 32 byte wide SIMD instruction might prefer that data have 16 or 32 byte alignment, even if normal memory access for the types being operated on only requires 4 or 8 byte alignment. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] min depth to nonzero in 3d array
Hi Alan, You can abuse np.argmax to calculate the first nonzero element in a vectorized manner: import numpy as np A = (2 * np.random.rand(100, 50, 50)).astype(int) Compare: np.argmax(A != 0, axis=0) np.array([[np.flatnonzero(A[:,i,j])[0] for j in range(50)] for i in range(50)]) You'll also want to check for all zero arrays with np.all: np.all(A == 0, axis=0) Cheers, Stephan On Thu, Apr 17, 2014 at 9:32 AM, Alan G Isaac alan.is...@gmail.com wrote: Given an array A of shape m x n x n (i.e., a stack of square matrices), I want an n x n array that gives the minimum depth to a nonzero element. E.g., the 0,0 element of the result is np.flatnonzero(A[:,0,0])[0] Can this be vectorized? (Assuming a nonzero element exists is ok, but dealing nicely with its absence is even better.) Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] min depth to nonzero in 3d array
I agree; argmax would the best option here; though I would hardly call it abuse. It seems perfectly readable and idiomatic to me. Though the != comparison requires an extra pass over the array, that's the kind of tradeoff you make in using numpy. On Thu, Apr 17, 2014 at 7:45 PM, Stephan Hoyer sho...@gmail.com wrote: Hi Alan, You can abuse np.argmax to calculate the first nonzero element in a vectorized manner: import numpy as np A = (2 * np.random.rand(100, 50, 50)).astype(int) Compare: np.argmax(A != 0, axis=0) np.array([[np.flatnonzero(A[:,i,j])[0] for j in range(50)] for i in range(50)]) You'll also want to check for all zero arrays with np.all: np.all(A == 0, axis=0) Cheers, Stephan On Thu, Apr 17, 2014 at 9:32 AM, Alan G Isaac alan.is...@gmail.comwrote: Given an array A of shape m x n x n (i.e., a stack of square matrices), I want an n x n array that gives the minimum depth to a nonzero element. E.g., the 0,0 element of the result is np.flatnonzero(A[:,0,0])[0] Can this be vectorized? (Assuming a nonzero element exists is ok, but dealing nicely with its absence is even better.) Thanks, Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
El 17/04/14 19:28, Julian Taylor ha escrit: On 17.04.2014 18:06, Francesc Alted wrote: In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] on arrays of this size you won't see alignment issues you are dominated by memory bandwidth. If at all you will only see it if the data fits into the cache. Its also about unaligned to simd vectors not unaligned to basic types. But it doesn't matter anymore on modern x86 cpus. I guess for array data cache line splits should also not be a big concern. Yes, that was my point, that in x86 CPUs this is not such a big problem. But still a factor of 2 is significant, even for CPU-intensive tasks. For example, computing sin() is affected similarly (sin() is using SIMD, right?): In [6]: shape = (1, 1) In [7]: x_aligned = np.zeros(shape, dtype=[('x',np.float64),('y',np.int64)])['x'] In [8]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] In [9]: %timeit res = np.sin(x_aligned) 1 loops, best of 3: 654 ms per loop In [10]: %timeit res = np.sin(x_unaligned) 1 loops, best of 3: 1.08 s per loop and again, numexpr can deal with that pretty well (using 8 physical cores here): In [6]: %timeit res = ne.evaluate('sin(x_aligned)') 10 loops, best of 3: 149 ms per loop In [7]: %timeit res = ne.evaluate('sin(x_unaligned)') 10 loops, best of 3: 151 ms per loop Aligned allocators are not the only allocator which might be useful in numpy. Modern CPUs also support larger pages than 4K (huge pages up to 1GB in size) which reduces TLB cache misses. Memory of this type typically needs to be allocated with special mmap flags, though newer kernel versions can now also provide this memory to transparent anonymous pages (normal non-file mmaps). That's interesting. In which scenarios do you think that could improve performance? In [8]: import numexpr as ne In [9]: %timeit res = ne.evaluate('x_aligned ** 2') 10 loops, best of 3: 133 ms per loop In [10]: %timeit res = ne.evaluate('x_unaligned ** 2') 10 loops, best of 3: 134 ms per loop i.e. there is not a significant difference between aligned and unaligned access to data. I wonder if the same technique could be applied to NumPy. you already can do so with relatively simple means: http://nbviewer.ipython.org/gist/anonymous/10942132 If you change the blocking function to get a function as input and use inplace operations numpy can even beat numexpr (though I used the numexpr Ubuntu package which might not be compiled optimally) This type of transformation can probably be applied on the AST quite easily. That's smart. Yeah, I don't see a reason why numexpr would be performing badly on Ubuntu. But I am not getting your performance for blocked_thread on my AMI linux vbox: http://nbviewer.ipython.org/gist/anonymous/11000524 oh well, threads are always tricky beasts. By the way, apparently the optimal block size for my machine is something like 1 MB, not 128 KB, although the difference is not big: http://nbviewer.ipython.org/gist/anonymous/11002751 (thanks to Stefan Van der Walt for the script). -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed
On 17.04.2014 20:30, Francesc Alted wrote: El 17/04/14 19:28, Julian Taylor ha escrit: On 17.04.2014 18:06, Francesc Alted wrote: In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] on arrays of this size you won't see alignment issues you are dominated by memory bandwidth. If at all you will only see it if the data fits into the cache. Its also about unaligned to simd vectors not unaligned to basic types. But it doesn't matter anymore on modern x86 cpus. I guess for array data cache line splits should also not be a big concern. Yes, that was my point, that in x86 CPUs this is not such a big problem. But still a factor of 2 is significant, even for CPU-intensive tasks. For example, computing sin() is affected similarly (sin() is using SIMD, right?): In [6]: shape = (1, 1) In [7]: x_aligned = np.zeros(shape, dtype=[('x',np.float64),('y',np.int64)])['x'] In [8]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] In [9]: %timeit res = np.sin(x_aligned) 1 loops, best of 3: 654 ms per loop In [10]: %timeit res = np.sin(x_unaligned) 1 loops, best of 3: 1.08 s per loop and again, numexpr can deal with that pretty well (using 8 physical cores here): In [6]: %timeit res = ne.evaluate('sin(x_aligned)') 10 loops, best of 3: 149 ms per loop In [7]: %timeit res = ne.evaluate('sin(x_unaligned)') 10 loops, best of 3: 151 ms per loop in this case the unaligned triggers a strided memcpy calling loop to copy the data into a aligned buffer which is terrible for performance, even compared to the expensive sin call. numexpr handles this well as it allows the compiler to replace the memcpy with inline assembly (a mov instruction). We could fix that in numpy, though I don't consider it very important, you usually always have base type aligned memory. (sin is not a SIMD using function unless you use a vector math library not supported by numpy directly yet) Aligned allocators are not the only allocator which might be useful in numpy. Modern CPUs also support larger pages than 4K (huge pages up to 1GB in size) which reduces TLB cache misses. Memory of this type typically needs to be allocated with special mmap flags, though newer kernel versions can now also provide this memory to transparent anonymous pages (normal non-file mmaps). That's interesting. In which scenarios do you think that could improve performance? it might improve all numpy operations dealing with big arrays. big arrays trigger many large temporaries meaning glibc uses mmap meaning lots of moving of address space between the kernel and userspace. but I haven't benchmarked it, so it could also be completely irrelevant. Also memory fragments really fast, so after a few hours of operation you often can't allocate any huge pages anymore, so you need to reserve space for them which requires special setup of machines. Another possibility for special allocators are numa allocators that ensure you get memory local to a specific compute node regardless of the system numa policy. But again its probably not very important as python has poor thread scalability anyway, these are just examples for keeping flexibility of our allocators in numpy and not binding us to what python does. In [8]: import numexpr as ne In [9]: %timeit res = ne.evaluate('x_aligned ** 2') 10 loops, best of 3: 133 ms per loop In [10]: %timeit res = ne.evaluate('x_unaligned ** 2') 10 loops, best of 3: 134 ms per loop i.e. there is not a significant difference between aligned and unaligned access to data. I wonder if the same technique could be applied to NumPy. you already can do so with relatively simple means: http://nbviewer.ipython.org/gist/anonymous/10942132 If you change the blocking function to get a function as input and use inplace operations numpy can even beat numexpr (though I used the numexpr Ubuntu package which might not be compiled optimally) This type of transformation can probably be applied on the AST quite easily. That's smart. Yeah, I don't see a reason why numexpr would be performing badly on Ubuntu. But I am not getting your performance for blocked_thread on my AMI linux vbox: http://nbviewer.ipython.org/gist/anonymous/11000524 my numexpr amd64 package does not make use of SIMD e.g. sqrt which is vectorized in numpy: numexpr: 1.30 │ 4638: sqrtss (%r14),%xmm0 0.01 │ ucomis %xmm0,%xmm0 0.00 │ ↓ jp 11ec4 4.14 │ 4646: movss %xmm0,(%r15,%r12,1) │ add%rbp,%r14 │ add$0x4,%r12 (unrolled a couple times) vs numpy: 83.25 │190: sqrtps (%rbx,%r12,4),%xmm0 0.52 │ movaps %xmm0,0x0(%rbp,%r12,4) 14.63 │ add$0x4,%r12 1.60 │ cmp%rdx,%r12 │ ↑ jb 190 (note the ps vs ss suffix, packed vs scalar) ___ NumPy-Discussion mailing
Re: [Numpy-discussion] About the npz format
Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. But following your suggestion, I used %prun. Here's (part of) the output (when I use savez): 195503 function calls in 4.466 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 22.2841.1422.2841.142 {method 'close' of '_io.BufferedWriter' objects} 10.9180.9180.9180.918 {built-in method remove} 488410.5680.0000.5680.000 {method 'write' of '_io.BufferedWriter' objects} 488290.3790.0000.3790.000 {built-in method crc32} 488300.1480.0000.1480.000 {method 'read' of '_io.BufferedReader' objects} 10.0900.0900.9930.993 zipfile.py:1315(write) 10.0720.0720.0720.072 {method 'tostring' of 'numpy.ndarray' objects} 488480.0050.0000.0050.000 {built-in method len} 10.0010.0010.2700.270 format.py:362(write_array) 30.0000.0000.0000.000 {built-in method open} 10.0000.0004.4664.466 npyio.py:560(_savez) 20.0000.0000.0000.000 zipfile.py:1459(close) 10.0000.0004.4664.466 {built-in method exec} Here's the output when I use save to save to a npy file: 39 function calls in 0.266 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 40.1960.0490.1960.049 {method 'write' of '_io.BufferedWriter' objects} 10.0690.0690.0690.069 {method 'tostring' of 'numpy.ndarray' objects} 10.0010.0010.2660.266 format.py:362(write_array) 10.0000.0000.0000.000 {built-in method open} 10.0000.0000.2660.266 npyio.py:406(save) 10.0000.0000.0000.000 format.py:261(write_array_header_1_0) 10.0000.0000.0000.000 {method 'close' of '_io.BufferedWriter' objects} 10.0000.0000.2660.266 {built-in method exec} 10.0000.0000.0000.000 format.py:154(magic) 10.0000.0000.0000.000 format.py:233(header_data_from_array_1_0) 10.0000.0000.2660.266 string:1(module) 10.0000.0000.0000.000 numeric.py:462(asanyarray) 10.0000.0000.0000.000 py3k.py:28(asbytes) The calls to close and the built-in method remove seem to be the responsible for the inefficiency of the Numpy implementation (compared to the Julia package that I mentioned before). This was tested using Python 3.4 and Numpy 1.8.1. However if I do the tests with Python 3.3.5 and Numpy 1.8.0, savez becomes much faster, so I think there is something wrong with this combination Python 3.4/Numpy 1.8.1. Also, if I use Python 2.4 and Numpy 1.2 (from my school's cluster) I get that np.save takes about 3.5 seconds and np.savez takes about 7 seconds, so all these timings seem to be hugely dependent on the system/version (maybe this explain David Palao's results?). However, they all point out that a significant amount of time is spent computing the crc32. Notice that prun reports that it takes 0.379 second to compute the crc32 of an array that takes 0.2 seconds to save to a npy file. I believe this is too much! And it get worse if you try to save bigger arrays. On Thu, Apr 17, 2014 at 5:23 AM, Nathaniel Smith n...@pobox.com wrote: On 17 Apr 2014 01:57, onefire onefire.mys...@gmail.com wrote: What I cannot understand is why savez takes more than 10 times longer than saving the data to a npy file. The only reason that I could come up with was the computation of the crc32. We can all make guesses but the solution is just to profile it :-). %prun in ipython (and then if you need more granularity installing line_profiler is useful). -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed to disk and then zips it which is horrible for performance and disk usage. It would be nice if we could add support for different compression modules like gzip or xz which allow streaming data directly into a file without an intermediate. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
Hi again, * David Palao dpalao.pyt...@gmail.com [2014-04-17]: 2014-04-16 20:26 GMT+02:00 onefire onefire.mys...@gmail.com: Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does not have a problem with the dependency). I am pretty happy with the npy format, but the npz format seems to be broken as far as performance is concerned (or I am missing obvious!). The following ipython session illustrates the issue: ln [1]: import numpy as np In [2]: x = np.linspace(1, 10, 5000) In [3]: %time np.save(x.npy, x) CPU times: user 40 ms, sys: 230 ms, total: 270 ms Wall time: 488 ms In [4]: %time np.savez(x.npz, data = x) CPU times: user 657 ms, sys: 707 ms, total: 1.36 s Wall time: 7.7 s Hi, In my case (python-2.7.3, numpy-1.6.1): In [23]: %time save(xx.npy, x) CPU times: user 0.00 s, sys: 0.23 s, total: 0.23 s Wall time: 4.07 s In [24]: %time savez(xx.npz, data = x) CPU times: user 0.42 s, sys: 0.61 s, total: 1.02 s Wall time: 4.26 s In my case I don't see the unbelievable amount of overhead of the npz thing. When profiling IO operations, there are many factors that can influence measurements. In my experience on Linux: these may include: the filesystem cache, the cpu govenor, the system load, power saving features, the type of hard drive and how it is connected, any powersaving features (e.g. laptop-mode tools) and any cron-jobs that might be running (e.g. updating locate DB). So for example when measuring the time it takes to write something to disk on Linux, I always at least include a call to ``sync`` which will ensure that all kernel filesystem buffers will be written to disk. Even then, you may still have a lot of variability. As part of bloscpack.sysutil I have wrapped this to be available from Python (needs root though). So, to re-rurn the benchmarks, doing each one twice: In [1]: import numpy as np In [2]: import bloscpack.sysutil as bps In [3]: x = np.linspace(1, 10, 5000) In [4]: %time np.save(x.npy, x) CPU times: user 12 ms, sys: 356 ms, total: 368 ms Wall time: 1.41 s In [5]: %time np.save(x.npy, x) CPU times: user 0 ns, sys: 368 ms, total: 368 ms Wall time: 811 ms In [6]: %time np.savez(x.npz, data = x) CPU times: user 540 ms, sys: 864 ms, total: 1.4 s Wall time: 4.74 s In [7]: %time np.savez(x.npz, data = x) CPU times: user 580 ms, sys: 808 ms, total: 1.39 s Wall time: 9.47 s In [8]: bps.sync() In [9]: %time np.save(x.npy, x) ; bps.sync() CPU times: user 0 ns, sys: 368 ms, total: 368 ms Wall time: 2.2 s In [10]: %time np.save(x.npy, x) ; bps.sync() CPU times: user 0 ns, sys: 356 ms, total: 356 ms Wall time: 2.16 s In [11]: bps.sync() In [12]: %time np.savez(x.npz, x) ; bps.sync() CPU times: user 564 ms, sys: 816 ms, total: 1.38 s Wall time: 8.21 s In [13]: %time np.savez(x.npz, x) ; bps.sync() CPU times: user 588 ms, sys: 772 ms, total: 1.36 s Wall time: 6.83 s As you can see, even when using ``sync`` the values might vary, so in addition it might be worth using %timeit, which will at least run it three times and select the best one in its default setting: In [14]: %timeit np.save(x.npy, x) 1 loops, best of 3: 2.4 s per loop In [15]: %timeit np.savez(x.npz, x) 1 loops, best of 3: 7.1 s per loop In [16]: %timeit np.save(x.npy, x) ; bps.sync() 1 loops, best of 3: 3.11 s per loop In [17]: %timeit np.savez(x.npz, x) ; bps.sync() 1 loops, best of 3: 7.36 s per loop So, anyway, given these readings, I would tend to support the claim that there is something slowing down writing when using plain NPZ w/o compression. FYI: when reading, the kernel keeps files that were recently read in the filesystem buffers and so when measuring reads, I tend to drop those caches using ``drop_caches()`` from bloscpack.sysutil (which wraps using the linux proc fs). best, V- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed to disk and then zips it which is horrible for performance and disk usage. As a workaround may also be possible to write the temporary NPY files to cStringIO instances and then use ``ZipFile.writestr`` with the ``getvalue()`` of the cStringIO object. However that approach may require some memory. In python 2.7, for each array: one copy inside the cStringIO instance and then another copy of when calling getvalue on the cString, I believe. best, V- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
* Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed to disk and then zips it which is horrible for performance and disk usage. As a workaround may also be possible to write the temporary NPY files to cStringIO instances and then use ``ZipFile.writestr`` with the ``getvalue()`` of the cStringIO object. However that approach may require some memory. In python 2.7, for each array: one copy inside the cStringIO instance and then another copy of when calling getvalue on the cString, I believe. There is a proof-of-concept implementation here: https://github.com/esc/numpy/compare/feature;npz_no_temp_file Here are the timings, again using ``sync()`` from bloscpack (but it's just a ``os.system('sync')``, in case you want to run your own benchmarks): In [1]: import numpy as np In [2]: import bloscpack.sysutil as bps In [3]: x = np.linspace(1, 10, 5000) In [4]: %timeit np.save(x.npy, x) ; bps.sync() 1 loops, best of 3: 1.93 s per loop In [5]: %timeit np.savez(x.npz, x) ; bps.sync() 1 loops, best of 3: 7.88 s per loop In [6]: %timeit np._savez_no_temp(x.npy, [x], {}, False) ; bps.sync() 1 loops, best of 3: 3.22 s per loop Not too bad, but still slower than plain NPY, memory copies would be my guess. PS: Running Python 2.7.6 :: Anaconda 1.9.2 (64-bit) and Numpy master Also, in cae you were wondering, here is the profiler output: In [2]: %prun -l 10 np._savez_no_temp(x.npy, [x], {}, False) 943 function calls (917 primitive calls) in 1.139 seconds Ordered by: internal time List reduced from 99 to 10 due to restriction 10 ncalls tottime percall cumtime percall filename:lineno(function) 10.3860.3860.3860.386 {zlib.crc32} 80.2340.0290.2340.029 {method 'write' of 'file' objects} 270.1620.0060.1620.006 {method 'write' of 'cStringIO.StringO' objects} 10.1580.1580.1580.158 {method 'getvalue' of 'cStringIO.StringO' objects} 10.0910.0910.0910.091 {method 'close' of 'file' objects} 240.0640.0030.0640.003 {method 'tobytes' of 'numpy.ndarray' objects} 10.0220.0221.1191.119 npyio.py:608(_savez_no_temp) 10.0190.0191.1391.139 string:1(module) 10.0020.0020.2270.227 format.py:362(write_array) 10.0010.0010.0010.001 zipfile.py:433(_GenerateCRCTable) V- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
Hi, * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed to disk and then zips it which is horrible for performance and disk usage. As a workaround may also be possible to write the temporary NPY files to cStringIO instances and then use ``ZipFile.writestr`` with the ``getvalue()`` of the cStringIO object. However that approach may require some memory. In python 2.7, for each array: one copy inside the cStringIO instance and then another copy of when calling getvalue on the cString, I believe. There is a proof-of-concept implementation here: https://github.com/esc/numpy/compare/feature;npz_no_temp_file Here are the timings, again using ``sync()`` from bloscpack (but it's just a ``os.system('sync')``, in case you want to run your own benchmarks): In [1]: import numpy as np In [2]: import bloscpack.sysutil as bps In [3]: x = np.linspace(1, 10, 5000) In [4]: %timeit np.save(x.npy, x) ; bps.sync() 1 loops, best of 3: 1.93 s per loop In [5]: %timeit np.savez(x.npz, x) ; bps.sync() 1 loops, best of 3: 7.88 s per loop In [6]: %timeit np._savez_no_temp(x.npy, [x], {}, False) ; bps.sync() 1 loops, best of 3: 3.22 s per loop Not too bad, but still slower than plain NPY, memory copies would be my guess. PS: Running Python 2.7.6 :: Anaconda 1.9.2 (64-bit) and Numpy master Also, in cae you were wondering, here is the profiler output: In [2]: %prun -l 10 np._savez_no_temp(x.npy, [x], {}, False) 943 function calls (917 primitive calls) in 1.139 seconds Ordered by: internal time List reduced from 99 to 10 due to restriction 10 ncalls tottime percall cumtime percall filename:lineno(function) 10.3860.3860.3860.386 {zlib.crc32} 80.2340.0290.2340.029 {method 'write' of 'file' objects} 270.1620.0060.1620.006 {method 'write' of 'cStringIO.StringO' objects} 10.1580.1580.1580.158 {method 'getvalue' of 'cStringIO.StringO' objects} 10.0910.0910.0910.091 {method 'close' of 'file' objects} 240.0640.0030.0640.003 {method 'tobytes' of 'numpy.ndarray' objects} 10.0220.0221.1191.119 npyio.py:608(_savez_no_temp) 10.0190.0191.1391.139 string:1(module) 10.0020.0020.2270.227 format.py:362(write_array) 10.0010.0010.0010.001 zipfile.py:433(_GenerateCRCTable) And, to shed some more light on this, the kernprofiler (line-by-line) output (of a slightly modified version): zsh» cat mp.py import numpy as np x = np.linspace(1, 10, 5000) np._savez_no_temp(x.npy, [x], {}, False) zsh» ./kernprof.py -v -l mp.py Wrote profile results to mp.py.lprof Timer unit: 1e-06 s File: numpy/lib/npyio.py Function: _savez_no_temp at line 608 Total time: 1.16438 s Line # Hits Time Per Hit % Time Line Contents == 608 @profile 609 def _savez_no_temp(file, args, kwds, compress): 610 # Import is postponed to here since zipfile depends on gzip, an optional 611 # component of the so-called standard library. 612 1 5655 5655.0 0.5 import zipfile 613 614 16 6.0 0.0 from cStringIO import StringIO 615 616 12 2.0 0.0 if isinstance(file, basestring): 617 12 2.0 0.0 if not file.endswith('.npz'): 618 11 1.0 0.0 file = file + '.npz' 619 620 11 1.0 0.0 namedict = kwds 621 24 2.0 0.0 for i, val in enumerate(args): 622 16 6.0 0.0 key = 'arr_%d' % i 623 11 1.0 0.0 if key in namedict.keys(): 624 raise ValueError( 625 Cannot use un-named variables and keyword %s % key) 626 11 1.0 0.0 namedict[key] = val 627 628 1
Re: [Numpy-discussion] About the npz format
Hello, * Valentin Haenel valen...@haenel.co [2014-04-17]: As part of bloscpack.sysutil I have wrapped this to be available from Python (needs root though). So, to re-rurn the benchmarks, doing each one twice: Actually, I just realized, that doing a ``sync`` doesn't require root. my bad, V- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
Interesting! Using sync() as you suggested makes every write slower, and it decreases the time difference between save and savez, so maybe I was observing the 10 times difference because the file system buffers were being flushed immediately after a call to savez, but not right after a call to np.save. I think your workaround might help, but a better solution would be to not use Python's zipfile module at all. This would make it possible to, say, let the user choose the checksum algorithm or to turn that off. Or maybe the compression stuff makes this route too complicated to be worth the trouble? (after all, the zip format is not that hard to understand) Gilberto On Thu, Apr 17, 2014 at 6:45 PM, Valentin Haenel valen...@haenel.co wrote: Hello, * Valentin Haenel valen...@haenel.co [2014-04-17]: As part of bloscpack.sysutil I have wrapped this to be available from Python (needs root though). So, to re-rurn the benchmarks, doing each one twice: Actually, I just realized, that doing a ``sync`` doesn't require root. my bad, V- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] About the npz format
I found this github issue (https://github.com/numpy/numpy/pull/3465) where someone mentions the idea of forking the zip library. Gilberto On Thu, Apr 17, 2014 at 8:09 PM, onefire onefire.mys...@gmail.com wrote: Interesting! Using sync() as you suggested makes every write slower, and it decreases the time difference between save and savez, so maybe I was observing the 10 times difference because the file system buffers were being flushed immediately after a call to savez, but not right after a call to np.save. I think your workaround might help, but a better solution would be to not use Python's zipfile module at all. This would make it possible to, say, let the user choose the checksum algorithm or to turn that off. Or maybe the compression stuff makes this route too complicated to be worth the trouble? (after all, the zip format is not that hard to understand) Gilberto On Thu, Apr 17, 2014 at 6:45 PM, Valentin Haenel valen...@haenel.cowrote: Hello, * Valentin Haenel valen...@haenel.co [2014-04-17]: As part of bloscpack.sysutil I have wrapped this to be available from Python (needs root though). So, to re-rurn the benchmarks, doing each one twice: Actually, I just realized, that doing a ``sync`` doesn't require root. my bad, V- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion