Re: [Pytables-users] Problem writing strings to a CArray. Could this be a bug?

Adriano Vilela Barbosa Tue, 29 Mar 2011 12:18:24 -0700

>>From: FrancescAlted <fal...@pytables.org>
>>To: Discussion list for PyTables <pytables-users@lists.sourceforge.net>
>>Sent: Mon, March 28, 2011 2:01:25 PM
>>Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this 
>>be 
>>
>>a bug?
>>
>>
>>2011/3/28 Adriano Vilela Barbosa <adriano.vil...@yahoo.com>
>>
>>> Okay.  The  problem was two-folded.  First of all, a bug in the way
>>> PyTables deals  with the defaults, made the MemoryError (this has been
>>> fixed in  trunk).  Secondly, and due to HDF5 limitations, you cannot use
>>> atoms  that are larger than 64 KB.  The canonical way to handle this is
>>> to add  more dimensions to the datasets in HDF5 and then use the slice
>>> selection  capabilities to retrieve the images.  Look at this:
>>
>>Actually, what you did below was the first thing I tried when moving away from
>>strings. However, it resulted in my code running dozens of times slower and my
>>HDF files being quite bigger. That's why I tried using bigger atoms (one atom
>>per optical flow frame), to see if this would run faster and/or produce 
>>smaller files, and then I ran into the error I reported.
>>
>>However, I later noticed that the shape of your array is
>>
>>array_shape = (n_frames, n_rows, n_cols)
>>
>>whereas I had tried
>>
>>array_shape = (n_rows, n_cols, n_frames)
>>
>>This makes a huge difference. Using a shape (n_frames, n_rows, n_cols) for the
>>CArray results in the code running only about 15% slower and producing a file
>>only about 10% bigger when compared to using strings. This is much better than
>>the results I was getting when using a shape (n_rows, n_cols, n_frames). I 
>>guess
>>this has to do with the way the data is laid out on disk?
>>
>
>Yes.  Data on-disk is written in C-order, so you must be sure than the leading 
>dimensions varies the slowest (i.e. as I have set them).


Ok. This is good to know.

>>As for the atom size limit (64 kB), I guess that doesn't apply to string 
atoms?
>>When using strings, I construct the atom in the following way
>>
>>array_atom = tables.StringAtom(len(matrix.tostring()))
>>
>>where len(matrix.tostring()) = 691200 bytes = 675 kB.
>>
>>I mean, the size of the string atom is quite above the 64 kB limit and yet it
>>doesn't produce any erros.
>>
>
>To be exact, the problem is not the atom size, but rather the maximum 
>attribute 

>size.  In this case, one attribute is used to keep the defaults for the atom, 
>and it cannot be larger than 64 KB.  Perhaps I should avoid to write the 
>attribute when the defaults heve, well, the default value (i.e. zero).  I'm 
>not 

>certain why this problem does not affect the string types though.

Ok. Look, you've been really helpful with all this. I really appreciate your 
help and all your work in Pytables. Thank you so much.

Adriano

>
>-- 
>>FrancescAlted
>>

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Problem writing strings to a CArray. Could this be a bug?

Reply via email to