On Tue, 3 Jun 2003, Ireneusz SZCZESNIAK wrote:
> Hi,
>
> > I'm glad to see other people developing HDF5 readers for OpenDX.
>
> I am also glad. It's good to exchange some ideas.
>
> > Of course, this module should be general enough to fit peoples'
> > needs, and that hightly depends on the HDF5 file layout used by the
> > various applications which generate HDF5 datafiles.
>
> I belive this is the main problem of our modules. Each of the modules
> requires some file layout specific to the software that produced that
> file.
Its for our convenience, of course ;)
> > > For plain unigrid data, I think, there exists a de-facto standard
> > > that each dataset should have at least an "origin" and a "delta"
> > > attribute which will be used to create the positions and
> > > connections of the imported DX field.
> >
> > Which is not necessary, as you can extract the required attributes,
> > create positions and replace them as needed in your dx program.
>
> I agree with Thomas, that unigrid data can be stored in an HDF5 file
> in a straightforward way (as he wrote above). As Richard points out,
> the values of "origin" and "delta" can be extracted and then
> incorporated into the filed (though I don't know how). However,
> Richard's way of doing must be rather complicated.
Not very. Its a matter of using Construct, Replace, Attribute and some
Compute. Easy to create a macro for for a given application. But read
on...
> Why not to say that the format of the unigrid data stored in an HDF5
> file should have "origin" and "delta"? In case they are missing, the
> default value of "origin" is {0, ... , 0}, and the default value of
> "delta" is {{1, 0, ..., 0}, {0, 1, 0, ... , 0}, ..., {0, ... , 0, 1}}.
>
> Perhaps in the case one has thousands of tiny datasents in an HDF5
> file is would not be thrifty to supply "delta" and "origin" to every
> dataset. But in most cases the overhead of "delta" and "origin" is
> negligible.
We could add extra inputs for the origin/delta attribute names and wether
they are at file or dataset scope. If they are not supplied, we just dont
create positions (and connections?).
> > Like Richard, I find it a bit inconvenient for files with multiple
> > datasets that a user has to specify each dataset to import
> > explicitely by name. One could output the list of dataset names and
> > then let the user pick one (or more) out of this list.
Yes, thats quite handy. Of course this list could be created by another
module - I dont have a strong preference here.
> > Our
> > ImportHDF5 module takes an index of the dataset to read instead,
> > ranging from 0 to the total number of datasets found in the file
> > (this information is output on a separate tab). Our datafiles
> > usually contain a time series of the same variable, with each
> > dataset having a "time" attribute attached to it. If such an attri-
> > bute was found during the browsing then the import module would also
> > sort the datasets by their time values. Feeding the "max" input tab
> > of a Sequencer module with the "total number of datasets"
> > information, and connecting the sequencer output tab back to
> > ImportHDF5, it is thus trivial to play an animation of all available
> > timesteps. To make the dataset selection via indices more general,
> > an import module could receive a (list of) attribute name(s) to sort
> > the list of datasets ?
I have not timeseries in one file, but in multiple files, creating
filenames using the Format module for sequencing.
> I like the functionality of your module, and I guess these features
> must be very useful for you. However, our datasets are very large and
> numerous. Therefore we cannot load all of them at once into the
> memory to produce an animation. Instead, we have to load one dataset
> at a time, produce an image, and then repeat the process. In this way
> we can create longer and more accurate animations.
Yes, I agree here. Of course not the whole file is read, but only the
structure and the selected datasets.
> > I'd start without these complexities in the hdf5 module, as you can
> > do nearly all of this stuff within your dx program.
>
> I agree with Richard. A module is going to be easier to use.
>
> > I want to add this feature to our ImportHDF5 module - so far one can
> > specify a single (n-1) slice to read from an n-dimensional dataset.
> > ImportHDF5 has a flag to tell it whether floating point datasets
> > should be imported in single or double precision. Maybe that's a
> > feature you want to add to your module ?
>
> Thanks, that sounds good. At the moment our module creates a field
> either of "float" or "double" depending on the data type of the
> dataset.
>
> > Also, our ImportHDF5 module can import data from remote HDF5 files
> > using the Stream Virtual File Driver of the HDF5 library. And we are
> > working on another remote file access driver for HDF5 based on
> > GridFtp. In the future we want to do remote visualization of
> > large-scale hierar- chical datasets using OpenDX.
>
> That's an excellent feature! That's trully something that is specific
> to HDF5. However, now we do not need this and thefore are not
> interested in supporting it.
>
> This summer we will be working intensly for roughly a month on our
> package. We will let you know of our advances. Please keep us posted
> too. Thank you for sharing your ideas!
Maybe someone could sponsor a meeting and we can create the ultimative
hdf5 module in about a week? At least for the european contributors?
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/