Re: [Numpy-discussion] About the npz format

2014-04-17 Thread David Palao
2014-04-16 20:26 GMT+02:00 onefire onefire.mys...@gmail.com: Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does not have a problem with the dependency). I am pretty happy with the

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Nathaniel Smith
On 17 Apr 2014 01:57, onefire onefire.mys...@gmail.com wrote: What I cannot understand is why savez takes more than 10 times longer than saving the data to a npy file. The only reason that I could come up with was the computation of the crc32. We can all make guesses but the solution is just to

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Nathaniel Smith
On Wed, Apr 16, 2014 at 4:17 PM, R Hattersley rhatters...@gmail.com wrote: For some reason the Python issue 21223 didn't show any activity until I logged in to post my patch. At which point I saw that haypo had already submitted pretty much exactly the same patch. *sigh* That was pretty much a

Re: [Numpy-discussion] ImportError: /usr/local/lib/python2.7/site-packages/numpy-1.8.0-py2.7-linux-x86_64.egg/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString

2014-04-17 Thread jaylene
No. I didn't rebuild numpy after rebuilding python. I searched online about this error. It said that this error might be caused by building python with USC-4. Is there a way to check if the python was built with USC-4 or USC-2? Will rebuilding python with USC-2 work? I'm really reluctant to

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Aron Ahmadia
On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing unaligned memory raised system faults. The default behavior

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Nathaniel Smith
On 17 Apr 2014 15:09, Aron Ahmadia a...@ahmadia.net wrote: On the one hand it would be nice to actually know whether posix_memalign is important, before making api decisions on this basis. FWIW: On the lightweight IBM cores that the extremely popular BlueGene machines were based on, accessing

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Aron Ahmadia
Hmnn, I wasn't being clear :) The default malloc on BlueGene/Q only returns 8 byte alignment, but the SIMD units need 32-byte alignment for loads, stores, and operations or performance suffers. On the /P the required alignment was 16-bytes, but malloc only gave you 8, and trying to perform

[Numpy-discussion] ANN: Bokeh 0.4.4 released

2014-04-17 Thread Bryan Van de Ven
I am happy to announce the release of Bokeh version 0.4.4! Bokeh is a Python library for visualizing large and realtime datasets on the web. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity to

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Francesc Alted
Uh, 15x slower for unaligned access is quite a lot. But Intel (and AMD) arquitectures are much more tolerant in this aspect (and improving). For example, with a Xeon(R) CPU E5-2670 (2 years old) I get: In [1]: import numpy as np In [2]: shape = (1, 1) In [3]: x_aligned =

[Numpy-discussion] min depth to nonzero in 3d array

2014-04-17 Thread Alan G Isaac
Given an array A of shape m x n x n (i.e., a stack of square matrices), I want an n x n array that gives the minimum depth to a nonzero element. E.g., the 0,0 element of the result is np.flatnonzero(A[:,0,0])[0] Can this be vectorized? (Assuming a nonzero element exists is ok, but dealing nicely

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Julian Taylor
On 17.04.2014 18:06, Francesc Alted wrote: In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] on arrays of this size you won't see alignment issues you are dominated by memory bandwidth. If at all you will only see it if the data fits into

Re: [Numpy-discussion] min depth to nonzero in 3d array

2014-04-17 Thread Stephan Hoyer
Hi Alan, You can abuse np.argmax to calculate the first nonzero element in a vectorized manner: import numpy as np A = (2 * np.random.rand(100, 50, 50)).astype(int) Compare: np.argmax(A != 0, axis=0) np.array([[np.flatnonzero(A[:,i,j])[0] for j in range(50)] for i in range(50)]) You'll also

Re: [Numpy-discussion] min depth to nonzero in 3d array

2014-04-17 Thread Eelco Hoogendoorn
I agree; argmax would the best option here; though I would hardly call it abuse. It seems perfectly readable and idiomatic to me. Though the != comparison requires an extra pass over the array, that's the kind of tradeoff you make in using numpy. On Thu, Apr 17, 2014 at 7:45 PM, Stephan Hoyer

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Francesc Alted
El 17/04/14 19:28, Julian Taylor ha escrit: On 17.04.2014 18:06, Francesc Alted wrote: In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] on arrays of this size you won't see alignment issues you are dominated by memory bandwidth. If at

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Julian Taylor
On 17.04.2014 20:30, Francesc Alted wrote: El 17/04/14 19:28, Julian Taylor ha escrit: On 17.04.2014 18:06, Francesc Alted wrote: In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x'] on arrays of this size you won't see alignment issues you

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread onefire
Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. But following your suggestion, I used %prun. Here's (part of) the output (when I use savez): 195503 function calls in 4.466 seconds Ordered by: internal time ncalls tottime percall

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Julian Taylor
On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hi again, * David Palao dpalao.pyt...@gmail.com [2014-04-17]: 2014-04-16 20:26 GMT+02:00 onefire onefire.mys...@gmail.com: Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
* Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hi, * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel,

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hello, * Valentin Haenel valen...@haenel.co [2014-04-17]: As part of bloscpack.sysutil I have wrapped this to be available from Python (needs root though). So, to re-rurn the benchmarks, doing each one twice: Actually, I just realized, that doing a ``sync`` doesn't require root. my bad, V-

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread onefire
Interesting! Using sync() as you suggested makes every write slower, and it decreases the time difference between save and savez, so maybe I was observing the 10 times difference because the file system buffers were being flushed immediately after a call to savez, but not right after a call to

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread onefire
I found this github issue (https://github.com/numpy/numpy/pull/3465) where someone mentions the idea of forking the zip library. Gilberto On Thu, Apr 17, 2014 at 8:09 PM, onefire onefire.mys...@gmail.com wrote: Interesting! Using sync() as you suggested makes every write slower, and it