Elena, Thanks for your detailed explanation. After reading it, and in my opinion, it is not completely true that HDF5 does implement the C/Fortran ordering meta-information (even in a more abstract way, as you said), because, as it is now, there is always an ambiguity on how to interpret the ordering of data on-disk.
I think the key point in your exposition can be resumed in the next sentence: """ Therefore HDF5 Fortran library instructs C library to store K,M,N values in the dataspace object header instead of N,M,K, since N is the size of the fastest changing dimension. """ So, what HDF5 is actually ensuring is the consistency between the the order of the dimensions in the dataspace and *fastest changing dimension* ordering in memory of the user datasets, but not the absolute *C/Fortran* ordering. This is what leads to the reported ambiguity in the dimension ordering of the datasets when you try to read an HDF5 file that was written in Fortran from a C-based program (or vice versa). At first sight, I'd have preferred that, provided that HDF5 has a C ordering convention, when someone is making use of the Fortran wrappers, that HDF5 itself would have transposed the *data* to be saved (instead of just "tranposing" the *dimension ordering*), so that the interpretation of both data and dimensionality ordering would have been completely unambiguous. However, I guess that you have chosen not do that in order to not penalize the performance of Fortran users (transposing the data is a quite costly operation). In some way, you have sacrificed data portability between C/Fortran users for the sake of performance, and I agree that this is a sensible approach for an efficient library like HDF5 tries to be. Having said that, and although HDF5 already does a terrific work in terms of cross-platform data portability by supporting metadata information for platform independent data types (including endianess), failing to support specific metadata about C/Fortran ordering is, IMHO, a serious design fault in terms of portability. That could easily be solved by adding the C/Fortran metadata, so that users can easily identify the original *intended* data ordering, and give them a chance to correctly interpret that ordering. That way, they would be able to choose whether to transpose the *data* at loading time in order efficiently deal with that data in-memory or just add some metainfo to their data containers (for example, NumPy does support this) stating that the in-memory ordering is different than the native one for the reading platform. Moreover, providing this C/Fortran ordering metadata is completely backward compatible, so my vote is +1 for HDF5 supporting it in the future. Thanks, Francesc A Sunday 01 June 2008, Elena Pourmal escrigué: > Hi Francesc and All, > > If you only knew how many times this question was asked and how many > attempts were done to explain :-) Here is another one. It is little- > bit lengthy, please forgive :-) > But I hope it will shed a light on why HDF5 doesn't support meta- > information for Fortran/C ordering in datasets (the short answer is - > it actually does, but in a more abstract way). > > > HDF5 is a "self-describing" format, which means that HDF5 metadata > stored in a dataset object header allows the HDF5 C library and any > other non-C applications built on top of it, to retrieve a raw data > (i.e. elements of a multidimensional array) in the correct order. > > (Let's for a second forget about HDF5, C and Fortran, Python and > Matlab :-) ) > > If we have a matrix A(N,M,K), we usually count dimensions from left > to right saying that the first dimension has size N, the second > dimension has size M, the third dimension has size K, and so on. > > (Now let's talk about HDF5 but without referring to any language.) > > When we describe a matrix using HDF5 datatspace object, we use the > same convention (i.e. specifying dimensions from left to right): the > first dimension has size N, the second dimension has size M, the > third dimension has size K. (Aside: Please notice that this > description is valid for both C and Fortran HDF5 applications, i.e. C > and Fortran dims array needed by H5Screate_simple > (h5screate_simple_f) will have the values dims [] = {N,M,K}). > > The question is: how does HDF5 know how to interpret a blob of {N x > M x K x by sizeof(datatype)} bytes of dataset raw data stored in the > file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other > permutation of (K,N,M)? > > HDF5 file has no clue about matrices and their dimensions, and the > languages they were written from. This is application's > responsibility to interpret data correctly and pass the correct > interpretation to the HDF5 C library to store in a file. > > As it was mentioned above, dimensions of the matrix are described > using HDF5 dataspace object and are stored in the file. d integers > P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace > object header according to the following convention: the last value > - Pd is the size of the FASTEST changing dimension of the matrix, > i.e. HDF5 file spec and HDF5 C library follow C storage convention > (no wonder, it is a C library :-). Therefore there is no ambiguity in > interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has > enough information to interpret data correctly by any "row-major" or > "column-major" application (including bypassing HDF5 C library and > reading directly from the HDF5 file!) > > Here is what is happening when HDF5 Fortran library is used: > > Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5 > Fortran API describes dataspace with the first dimension being N, the > second dimension being M, the third dimension being K (as we would do > it in C and any other language). But HDF5 Fortran API also knows > that the fastest changing dimension has size N (i.e. we have > column-major order). Therefore HDF5 Fortran library instructs C > library to store K,M,N values in the dataspace object header instead > of N,M,K, since N is the size of the fastest changing dimension. > > So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype) > blob) written from Fortran by a C application, we will read it to > the matrix B(K,M,N) ( C API that requests sizes of the first, second > and third dimensions will return values K,M,N stored in the dataspace > header) > > If we read matrix A(N,M,K) written from Fortran by Fortran > application, we will read it once again into B(N,M,K) ( Fortran API > that requests sizes of the first, second and third dimension will > flip an array K,M,N stored in the file and return N,M,K) > > In other words: HDF5 library stores information about how to > interpret data. Interpretation follows C storage convention: the last > dimension specified for the dataspace object is the fastest changing > one. It is the responsibility of the application (in this case > FORTRAN HDF5 library) to interpret correctly the order of dimensions > and pass to/ from the HDF5 C library. > > Please notice that there is no need to transpose data itself: one > only has to pass a correct interpretation of the data to the HDF5 C > Library and to make sure it is done according to the HDF5 C library > convention - the first value stored in the dataspace header > corresponds to the slowest changing dimension, ...., the last value > stored in the dataspace header corresponds to the fastest changing > dimension). > > Please let me know if my explanation made things worse. Frankly > speaking I think it did ;-) but I tried..... > > Elena > > On May 31, 2008, at 4:54 AM, Francesc Alted wrote: > > Hi, > > > > An HDF5/PyTables user asked whether HDF5 supports meta-information > > for keeping Fortran/C ordering in datasets. By reading the docs, > > it seems to me that HDF5 doesn't support this yet. Are there plans > > to support this feature? > > > > Thanks, > > > > ---------- Missatge transmès ---------- > > > > Subject: Re: [Pytables-users] Reading Fortran arrays with correct > > array > > indexing > > Date: Saturday 31 May 2008 > > From: "Milos Ilak" <[EMAIL PROTECTED]> > > To: "Francesc Alted" <[EMAIL PROTECTED]> > > > > Hi Francesc, > > > > thanks a lot! I didn't know MATLAB used Fortran order too. My > > Python code needs to read in files written in both orders, so I > > just added an attribute in my Fortran output routine which the > > Python code looks for and if it is there, it transposes the data > > after loading. > > > > I would have thought that the meta-information about the order > > would be > > stored somewhere in the file. Do you know if the future versions of > > HDF5 > > will support this? Thanks again, > > > > Milos > > > > > > On Fri, May 30, 2008 at 8:22 AM, Francesc Alted > > <[EMAIL PROTECTED]> > > > > wrote: > >> A Thursday 29 May 2008, Milos Ilak escrigué: > >>> Hi all, > >>> > >>> I apologize if this has been discussed, but I could not find any > >>> information in the archives. I am creating HDF5 files with 3-D > > > > arrays > > > >>> in Fortran 90, and I need to read them in both Python and MATLAB. > >>> While MATLAB recognizes the correct dimensions of the arrays, > >>> PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes > > > > (z,y,x) > > > >>> when PyTables reads it). I know that this is due to the fact that > > > > the > > > >>> order in which Fortran stores arrays is different than that of > >>> Python, C or MATLAB, and I couldn't determine how exactly MATLAB > >>> 'knows' that Fortran arrays are being read. > >> > >> Well, it is easy: because MATLAB writes and reads arrays in > >> *Fortran* order. So, if you write your arrays with Fortran, then > >> you are not going to have any problem to read them in the correct > >> order from MATLAB. However, as PyTables uses a C API to access > >> HDF5 files, and > > > > as > > > >> C follows a different order for matrices in memory, you will get > >> inverted dimensions for your Fortran created files (as it is the > > > > case). > > > >>> I have tried using the > >>> > >>> 'isfortran' command in numpy, but I get the following error: > >>>>>> hh5f.root > >>> > >>> / (RootGroup) '' > >>> children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v' > >>> (Array), 'y' (Array), 'x' (Array), 'z' (Array)] > >>> > >>>>>> hh5f.root.v > >>> > >>> /v (Array(16L, 33L, 32L)) '' > >>> atom := Float64Atom(shape=(), dflt=0.0) > >>> maindim := 0 > >>> flavor := 'numpy' > >>> byteorder := 'little' > >>> chunkshape := None > >>> > >>>>>> numpy.isfortran(hh5f.root.v) > >>> > >>> Traceback (most recent call last): > >>> File "<stdin>", line 1, in <module> > >>> File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py", > >>> line 184, in isfortran > >>> return a.flags.fnc > >>> AttributeError: 'Array' object has no attribute 'flags' > >>> > >>> It seems like there is perhaps some kind of flag I should add > >>> when writing in Fortran to indicate that the array is in Fortran > >>> order, but MATLAB somehow seems to know that anyway. Any advice > >>> would be greatly appreciated. > >> > >> You are applying the numpy isfortran() function to a pytables > >> Array > > > > and > > > >> not a numpy object. The correct call would be: > >>>>> numpy.isfortran(hh5f.root.v[:]) > >> > >> because the result of reading a pytables Array is a numpy object. > >> > >> However, this won't tell you anything about the actual order > >> (Fortran > > > > or > > > >> C) in which the array was written because this meta-information is > >> not > >> saved anywhere in the file (apparently HDF5 does not support this > > > > yet). > > > >> So, unless you want to provide this info yourself by using, say, > >> an > > > > HDF5 > > > >> attribute, your best bet is to *deduce* the ordering by knowing > >> that the file comes from a Fortran or a C program and *transpose* > >> manually your arrays after reading them (if you need to). > >> > >> Hope this helps, > >> > >> -- > >> Francesc Alted > >> Freelance developer > >> Tel +34-964-282-249 > >> > >> ------------------------------------------------------------------ > >>------- This SF.net email is sponsored by: Microsoft > >> Defy all challenges. Microsoft(R) Visual Studio 2008. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pytables-users@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------- > > > > -- > > Francesc Alted > > Freelance developer > > Tel +34-964-282-249 > > > > ------------------------------------------------------------------- > >--- This mailing list is for HDF software users discussion. > > To subscribe to this list, send a message to > > [EMAIL PROTECTED] . > > To unsubscribe, send a message to > > [EMAIL PROTECTED] > > --------------------------------------------------------------------- >- This mailing list is for HDF software users discussion. > To subscribe to this list, send a message to > [EMAIL PROTECTED] To unsubscribe, send a message to > [EMAIL PROTECTED] -- Francesc Alted Freelance developer Tel +34-964-282-249 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users