A Wednesday 23 March 2011 19:53:43 Adriano Vilela Barbosa escrigué: > >From: FrancescAlted <fal...@pytables.org> > >To: Discussion list for PyTables > ><pytables-users@lists.sourceforge.net> Sent: Wed, March 23, 2011 > >10:57:06 AM > >Subject: Re: [Pytables-users] Problem writing strings to a CArray. > >Could this be > > > >a bug? > > > > > >2011/3/23 Adriano Vilela Barbosa <adriano.vil...@yahoo.com> > > > >This is not a bug, but rather a feature of NumPy. Look at this: > >>>>> import numpy as np > >>>>> a = np.array(['aa\x00\x00']) > >>>>> a[0] > >> > >>'aa' # hey! were have my trailing 0's gone? > >> > >>>>> a.data[:] > >> > >>'aa\x00\x00' # yeah, they still are in the data area of the array > >> > >>I'd recommend you using the byte ('i1') type for achieving what you want: > >>>>> a.view('i1') > >> > >>array([97, 97, 0, 0], dtype=int8) > >> > >>Thank you very much for your explanation, but I still don't get it. > >> > >>Let's forget numpy for a moment and just say I want to store the > >>string 'aa\x00\x00' in a CArray. Each element of the CArray is a 4 > >>element string. > >> > >>First, I create the CArray: > >>>>> import tables > >>>>> fid = tables.openFile("carray_test.hdf","w") > >>>>> > >>>>> fid.createGroup("/", 'table', 'Binary table') > >>>>> array_atom = tables.StringAtom(itemsize=4) > >>>>> array_shape = (1,) > >>>>> > >>>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_sh > >>>>> ape) > >> > >>Now, I store the string 'aa\x00\x00' in the first row (which is the > >>only row > > in > > >>this example) of the CArray: > >>>>> fid.root.table.bin_table[0] = 'aa\x00\x00' > >> > >>Now, I do > >> > >>>>> fid.root.table.bin_table[0].data[:] > >> > >>'aa' > >> > >>So, it looks to me that the trailing \x00 elements of the string > >>are not being stored in the CArray. From my side, there's no numpy > >>involved; I'm just trying to store a string. What am I missing? > > You cannot avoid NumPy because PyTables uses NumPy behind the scenes > as an intermediate buffer area. What you are seeing is probably a > secondary effect caused by the 'feature' I mentioned before. Any > reason why you don't want to use a byte type instead of a string? > > >FrancescAlted > > Hi again, > > I'm happy to use bytes instead of strings. The reason I was using > strings is that, as someone new to Python and numpy, I thought > strings were the only way of dealing with individual bytes. Also, > because of this problem I'm having with strings, I tried storing the > numpy arrays directly into the HDF file, but the performance was > quite poorer and the file size quite bigger. > > So, going back to my previous example, I guess the only things I need > to change is the Atom object used to construct the CArray and also > to use the method view() instead of tostring() of the numpy array. > > >>> import numpy > >>> import tables > >>> fid = tables.openFile("carray_test.hdf","w") > >>> fid.createGroup("/", 'table', 'Binary table') > >>> array_atom = tables.Atom.from_dtype(numpy.dtype((numpy.byte, > >>> (4,)))) array_shape = (1,) > >>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shap > >>> e) a = numpy.array(['aa\x00\x00']) > >>> fid.root.table.bin_table[0] = a.view('b') > >>> fid.root.table.bin_table[0].data[:] > > 'aa\x00\x00'
Ah! I see where the problem was now (the assignation). Thanks for showing the point. > Is this right, or there's a more efficient way of doing it? Well, I don't fully understand why you are converting to strings prior to save the info into the CArray because it should support compression and be pretty fast too. Are you really getting a significant speed-up by converting to strings? In case you want to continue the conversion path, I'd also try a VLArray where the elements have been previously compressed using the blosc package (https://github.com/FrancescAlted/python-blosc). Good luck! -- Francesc Alted ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users