Re: [Numpy-discussion] About the npz format

2014-07-06 Thread Sturla Molden
There is no os.mkfifo on Windows. Sturla Valentin Haenel valen...@haenel.co wrote: sorry, for the top-post, but should we add this as an issue on the github tracker? I'd like to revisit it this summer. V- * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-18]: On 18.04.2014 18:29,

Re: [Numpy-discussion] About the npz format

2014-07-04 Thread Valentin Haenel
sorry, for the top-post, but should we add this as an issue on the github tracker? I'd like to revisit it this summer. V- * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-18]: On 18.04.2014 18:29, Valentin Haenel wrote: Hi, * Valentin Haenel valen...@haenel.co [2014-04-17]: *

Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Valentin Haenel
Hi Gilberto, * onefire onefire.mys...@gmail.com [2014-04-18]: Interesting! Using sync() as you suggested makes every write slower, and it decreases the time difference between save and savez, so maybe I was observing the 10 times difference because the file system buffers were being flushed

Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Valentin Haenel
Hi again, * onefire onefire.mys...@gmail.com [2014-04-18]: I think your workaround might help, but a better solution would be to not use Python's zipfile module at all. This would make it possible to, say, let the user choose the checksum algorithm or to turn that off. Or maybe the

Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Francesc Alted
El 18/04/14 13:01, Valentin Haenel ha escrit: Hi again, * onefire onefire.mys...@gmail.com [2014-04-18]: I think your workaround might help, but a better solution would be to not use Python's zipfile module at all. This would make it possible to, say, let the user choose the checksum

Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Valentin Haenel
Hi, * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Thanks for the suggestion. I did profile the program before, just not using

Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Julian Taylor
On 18.04.2014 18:29, Valentin Haenel wrote: Hi, * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Thanks for the suggestion. I did profile the

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread David Palao
2014-04-16 20:26 GMT+02:00 onefire onefire.mys...@gmail.com: Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does not have a problem with the dependency). I am pretty happy with the

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Nathaniel Smith
On 17 Apr 2014 01:57, onefire onefire.mys...@gmail.com wrote: What I cannot understand is why savez takes more than 10 times longer than saving the data to a npy file. The only reason that I could come up with was the computation of the crc32. We can all make guesses but the solution is just to

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread onefire
Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. But following your suggestion, I used %prun. Here's (part of) the output (when I use savez): 195503 function calls in 4.466 seconds Ordered by: internal time ncalls tottime percall

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Julian Taylor
On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or if it does now we aren't using it). So numpy writes the file uncompressed

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hi again, * David Palao dpalao.pyt...@gmail.com [2014-04-17]: 2014-04-16 20:26 GMT+02:00 onefire onefire.mys...@gmail.com: Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program before, just not using Python. one problem of npz is that the zipfile module does not support streaming data in (or

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
* Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel, Thanks for the suggestion. I did profile the program

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hi, * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: * Valentin Haenel valen...@haenel.co [2014-04-17]: Hi, * Julian Taylor jtaylor.deb...@googlemail.com [2014-04-17]: On 17.04.2014 21:30, onefire wrote: Hi Nathaniel,

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread Valentin Haenel
Hello, * Valentin Haenel valen...@haenel.co [2014-04-17]: As part of bloscpack.sysutil I have wrapped this to be available from Python (needs root though). So, to re-rurn the benchmarks, doing each one twice: Actually, I just realized, that doing a ``sync`` doesn't require root. my bad, V-

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread onefire
Interesting! Using sync() as you suggested makes every write slower, and it decreases the time difference between save and savez, so maybe I was observing the 10 times difference because the file system buffers were being flushed immediately after a call to savez, but not right after a call to

Re: [Numpy-discussion] About the npz format

2014-04-17 Thread onefire
I found this github issue (https://github.com/numpy/numpy/pull/3465) where someone mentions the idea of forking the zip library. Gilberto On Thu, Apr 17, 2014 at 8:09 PM, onefire onefire.mys...@gmail.com wrote: Interesting! Using sync() as you suggested makes every write slower, and it

[Numpy-discussion] About the npz format

2014-04-16 Thread onefire
Hi all, I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does not have a problem with the dependency). I am pretty happy with the npy format, but the npz format seems to be broken as far as

Re: [Numpy-discussion] About the npz format

2014-04-16 Thread Valentin Haenel
Hi Gilberto, * onefire onefire.mys...@gmail.com [2014-04-16]: I have been playing with the idea of using Numpy's binary format as a lightweight alternative to HDF5 (which I believe is the right way to do if one does not have a problem with the dependency). I am pretty happy with the npy

Re: [Numpy-discussion] About the npz format

2014-04-16 Thread Nathaniel Smith
crc32 extremely fast, and I think zip might use adler32 instead which is even faster. OTOH compression is incredibly slow, unless you're using one of the 'just a little bit of compression' formats like blosc or lzo1. If your npz files are compressed then this is certainly the culprit. The zip

Re: [Numpy-discussion] About the npz format

2014-04-16 Thread onefire
Valentin Haenel, Bloscpack definitely looks interesting but I need to take a careful look first. I will let you know if I like it. Thanks for the suggestion! I think you and Nathaniel Smith misunderstood my questions (my fault, since I did not explain myself well!). First, Numpy's savez will not