> >From: FrancescAlted <fal...@pytables.org> >To: Discussion list for PyTables <pytables-users@lists.sourceforge.net> >Sent: Wed, March 23, 2011 10:57:06 AM >Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this >be > >a bug? > > >2011/3/23 Adriano Vilela Barbosa <adriano.vil...@yahoo.com> > >This is not a bug, but rather a feature of NumPy. Look at this: >> >>>>> import numpy as np >>>>> a = np.array(['aa\x00\x00']) >>>>> a[0] >>'aa' # hey! were have my trailing 0's gone? >>>>> a.data[:] >>'aa\x00\x00' # yeah, they still are in the data area of the array >> >>I'd recommend you using the byte ('i1') type for achieving what you want: >> >>>>> a.view('i1') >>array([97, 97, 0, 0], dtype=int8) >> >>Thank you very much for your explanation, but I still don't get it. >> >>Let's forget numpy for a moment and just say I want to store the string >>'aa\x00\x00' in a CArray. Each element of the CArray is a 4 element string. >>First, I create the CArray: >> >>>>> import tables >>>>> fid = tables.openFile("carray_test.hdf","w") >> >>>>> fid.createGroup("/", 'table', 'Binary table') >>>>> array_atom = tables.StringAtom(itemsize=4) >>>>> array_shape = (1,) >> >>>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape) >> >>Now, I store the string 'aa\x00\x00' in the first row (which is the only row in >>this example) of the CArray: >> >>>>> fid.root.table.bin_table[0] = 'aa\x00\x00' >> >>Now, I do >> >>>>> fid.root.table.bin_table[0].data[:] >>'aa' >> >>So, it looks to me that the trailing \x00 elements of the string are not being >>stored in the CArray. From my side, there's no numpy involved; I'm just trying >>to store a string. What am I missing? >>
You cannot avoid NumPy because PyTables uses NumPy behind the scenes as an intermediate buffer area. What you are seeing is probably a secondary effect caused by the 'feature' I mentioned before. Any reason why you don't want to use a byte type instead of a string? -- >FrancescAlted > Hi again, I'm happy to use bytes instead of strings. The reason I was using strings is that, as someone new to Python and numpy, I thought strings were the only way of dealing with individual bytes. Also, because of this problem I'm having with strings, I tried storing the numpy arrays directly into the HDF file, but the performance was quite poorer and the file size quite bigger. So, going back to my previous example, I guess the only things I need to change is the Atom object used to construct the CArray and also to use the method view() instead of tostring() of the numpy array. >>> import numpy >>> import tables >>> fid = tables.openFile("carray_test.hdf","w") >>> fid.createGroup("/", 'table', 'Binary table') >>> array_atom = tables.Atom.from_dtype(numpy.dtype((numpy.byte, (4,)))) >>> array_shape = (1,) >>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape) >>> a = numpy.array(['aa\x00\x00']) >>> fid.root.table.bin_table[0] = a.view('b') >>> fid.root.table.bin_table[0].data[:] 'aa\x00\x00' Is this right, or there's a more efficient way of doing it? Thank you very much. Your help is greatly appreciated. Adriano ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users