On Wed, 24 Feb 2010 08:31:32 -0600, Dmitry Karpeev <karpeev at mcs.anl.gov> wrote: > Yes, I think SQL or some such approach would be a good solution. > I don't even think the actual file format matters too much: we can just > create collections of files that share keys. The database is needed only > to manage file names. It could also store other data, of course, but > that's just gravy.
I think the database needs to hold a nontrivial amount of semantic information. For example, suppose we have a DMComposite covering multiple domains, with some domains having more than one DM on the same mesh (as in mixed FEM). These DMs will share coordinate DMs and the associated position vectors (which may be time-dependent). Other metadata, such as precision, endianness, units, scaling factors, time, and projections, would (in my opinion) also go in the database so that everything can be wired up without opening these files, and they can be slurped in with a single collective read. > Yes, labels are cumbersome, since they have to be create manually, etc. > However, when we decide where on the filesystem to place a file, we are > essentially selecting its labels: the directories on the path. At least those > are *some* of the labels we'd like to attach to the file and the filesystem > only > allows "labels" encoded as directories. I agree that it would be nice to > allow > more general queries, but based on what (permissions, timestamp? those > sound like natural candidates)? I wasn't thinking of filesystem metadata at all, it's the user-visible attributes and relationship among objects in the simulation that are significant. We have to drop the files somewhere and give them a name, but I'd be happy if they were just named by SHA1. The name has no significance since you can't do anything with it without the semantic information in the database. Jed
