>>From: FrancescAlted <fal...@pytables.org> >>To: Discussion list for PyTables <pytables-users@lists.sourceforge.net> >>Sent: Mon, March 28, 2011 2:01:25 PM >>Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this >>be >> >>a bug? >> >> >>2011/3/28 Adriano Vilela Barbosa <adriano.vil...@yahoo.com> >> >>> Okay. The problem was two-folded. First of all, a bug in the way >>> PyTables deals with the defaults, made the MemoryError (this has been >>> fixed in trunk). Secondly, and due to HDF5 limitations, you cannot use >>> atoms that are larger than 64 KB. The canonical way to handle this is >>> to add more dimensions to the datasets in HDF5 and then use the slice >>> selection capabilities to retrieve the images. Look at this: >> >>Actually, what you did below was the first thing I tried when moving away from >>strings. However, it resulted in my code running dozens of times slower and my >>HDF files being quite bigger. That's why I tried using bigger atoms (one atom >>per optical flow frame), to see if this would run faster and/or produce >>smaller files, and then I ran into the error I reported. >> >>However, I later noticed that the shape of your array is >> >>array_shape = (n_frames, n_rows, n_cols) >> >>whereas I had tried >> >>array_shape = (n_rows, n_cols, n_frames) >> >>This makes a huge difference. Using a shape (n_frames, n_rows, n_cols) for the >>CArray results in the code running only about 15% slower and producing a file >>only about 10% bigger when compared to using strings. This is much better than >>the results I was getting when using a shape (n_rows, n_cols, n_frames). I >>guess >>this has to do with the way the data is laid out on disk? >> > >Yes. Data on-disk is written in C-order, so you must be sure than the leading >dimensions varies the slowest (i.e. as I have set them).
Ok. This is good to know. >>As for the atom size limit (64 kB), I guess that doesn't apply to string atoms? >>When using strings, I construct the atom in the following way >> >>array_atom = tables.StringAtom(len(matrix.tostring())) >> >>where len(matrix.tostring()) = 691200 bytes = 675 kB. >> >>I mean, the size of the string atom is quite above the 64 kB limit and yet it >>doesn't produce any erros. >> > >To be exact, the problem is not the atom size, but rather the maximum >attribute >size. In this case, one attribute is used to keep the defaults for the atom, >and it cannot be larger than 64 KB. Perhaps I should avoid to write the >attribute when the defaults heve, well, the default value (i.e. zero). I'm >not >certain why this problem does not affect the string types though. Ok. Look, you've been really helpful with all this. I really appreciate your help and all your work in Pytables. Thank you so much. Adriano > >-- >>FrancescAlted >> ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users