Matt and I talked about this a couple months ago, but I'd like to also mention it here. It seems to me that data formats like HDF5 are really a pain to use for generic purposes, because you end up trying to map a directed graph of object relations (composition) into a hierarchical data format, and then implement relational queries on top of this hierarchy. (I've done this, to some extent, and I ended up writing cumbersome code to walk this hierarchy to answer queries that would be one-line SQL queries.)
To elaborate slightly on the problem, the goal would be to write vectors living on a DMComposite, with extra semantics like time step and units, in a way that could be used for visualization as well as checkpoints for forward and adjoint models. PETSc's unadorned binary IO is fine if the same code is going to read it back in, because everything will be wired up correctly and we're just loading into a Vec (although it's already somewhat tricky when the layout changes in the unstructured case). But there just isn't enough metadata to operate on in any sort of generic way, and I hate writing custom code to describe meshes and relations between them. Current scientific data formats (at least those I have seen) are a hassle to use since they have poor support for expressing relations. HDF5 has the equivalent of file-system symlinks, but after normalization, all the relations end up being encoded as a bunch of symlinks, which is a relatively low-level view and isn't a particularly convenient thing to traverse when answering a query. So I'm curious if anyone has put such metadata into a relational database instead of trying to contort it into one of these "scientific" data formats. My thought would be to drop only the metadata into something like Sqlite, and write the arrays themselves using MPI-IO (or HDF5/NetCDF/whatever, but these don't provide much when we aren't using them for metadata). This would allow efficient support of queries like "all vector fields at step M" and "fields B and C from step M to N on subdomains intersecting bounding box XYZ". This isn't completely different from what XDMF tries to do, but experimentation with that left a sour taste. Is SQL a stupid idea for this purpose and I'd be better off writing code to support the queries I want on HDF5/XDMF/something else? Jed
