A Thursday 25 March 2010 17:32:51 Francesc Alted escrigué: > [Hi Jakob. It seems that you sent this from unsubscribed address. Please > subscribe first before sending to mailing list. Thanks.] > > ------------------------------------------------------------------------- > Re: 64-bit bug in PyTables/Numpy? > De: Jakob van Santen <vansan...@wisc.edu> > A: pytables-users@lists.sourceforge.net > Data: Dimarts 23:19:36 > > Hello, > > a follow-up on this issue: > > This turned out to be due to the way the file was written: 64-bit integers > were being written as 8-byte integers with 32-bit precision. The HDF5 > library noticed that the type only had 4 significant bytes and so only > wrote out the lower word in H5TBread_records(). Since PyTables prepares > the data area with numpy.empty and not numpy.empty_like, the memory is not > zeroed. This is fine as long as types always have precision==8*width, but > it breaks otherwise. > > This is more of a pseudo-bug in HDF5; it would seem more logical to pad out > the field with zeroes than simply leave the padding bytes unwritten.
Hairy subject indeed! Well, I'd happily replace numpy.empty() by numpy.zeros() if performance would not be affected. It is true that, for small arrays, performance is barely the same: In [1]: dt = np.dtype("f4,f8,i4") In [2]: timeit np.empty((40), dtype=dt) 1000000 loops, best of 3: 1.32 µs per loop In [3]: timeit np.zeros((40), dtype=dt) 1000000 loops, best of 3: 1.27 µs per loop but, for somewhat larger arrays (~64 KB), the situation changes radically: In [4]: timeit np.empty((4000), dtype=dt) 1000000 loops, best of 3: 1.36 µs per loop In [5]: timeit np.zeros((4000), dtype=dt) 100000 loops, best of 3: 5.08 µs per loop The above shows that empty() always takes the same time to get the container, while zeros() depends on the number of elements of the array. It is true that the underlying memset/bzero system call is very fast (17 GB/s on my machine), but almost 4 µs more for creating a 64 KB container can be critical in some scenarios. Mmh, a possibility would be to add a new parameter when opening the file to zero all the containers. But I'd like to comment this in the HDF5 mailing list just to see if this can be considered a bug (but I'm afraid that this is not the case). -- Francesc Alted ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users