Yes, but what about using Spotlight programmatically (e.g., from PETSc) to store rich state, checkpointing, etc? For example, I want to store a Vec. How do I label it? There maybe various user contexts that share it, so I'd like to label it with all of them.
In a way, I don't to have to look at my home directory (or any directory) at all. I just want to extract files based on a given (set of) label(s). Dmitry. On Tue, Feb 23, 2010 at 1:40 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: > > ? With google (and Spotlight on the Mac) is there any need to organize > anything anymore? Just burp down the data any way you please anywhere you > want it and then have smart search tools find it for you and format it the > way you need it at the time you need it? This does mean you need decent > tools to parse random stuff for the search to understand it. > > ? Ironically in the past few years with Spotlight on my Mac I actually do a > better job of organizing my home directory structure then I ever have > before. > > ? Barry > > On Feb 23, 2010, at 1:31 PM, Dmitry Karpeev wrote: > >> This takes the discussion in a somewhat tangential direction, but consider >> this: >> >> We use hierarchical file systems, which are also a pain. >> Say, I'm working on project PETSc and I'm writing a DOE proposal for it. >> Should I put it in ~/PETSc/Proposals/DOE/proposal or >> ~/Proposals/DOE/PETSc/proposal or >> ~/Proposals/PETSc/DOE? >> Later (3 months from now) I might want to come back and retrieve a >> file from that proposal tree. >> Where do I look for it? >> Maybe I should have all of these paths, all but one being soft links >> to the master path? >> I've tried that. ?It's a pain. >> >> Basically, any hierarchical storage format, such as a file system, >> will impose a tree structure on >> what is fundamentally a (hyper)graph. >> GMail solves a similar problem by allowing multiple labels on a piece of >> email. >> Then I can search on any or several of the labels: Proposals, DOE, >> PETSc, irrespective of the order. >> A file system imposes an artificial order. >> You can think of labels as being the hyperedges in the hypergraph. >> >> It would be nice to have a file system that functioned a bit like >> GMail, I think. >> In fact, I've thought about writing a Python replacement for 'ls', >> that would list files with a given label or labels. ? I'm too lazy and >> incompetent, however. >> In the simplest case the metadata could go right into the filename, >> but maybe that's not >> a good thing to do in general. >> >> >> Dmitry. >> >> On Tue, Feb 23, 2010 at 10:24 AM, Barry Smith <bsmith at mcs.anl.gov> wrote: >>> >>> ?I've thought about this be never done anything, I think it is worth >>> investigating. >>> >>> ?BTW: My long term goal is also that all PETSc source code lives in an >>> appropriate database with appropriate relationships and meta-data stored >>> there. >>> >>> ?The fact that we (meaning HPC and OpenSource in general) use flat files >>> so >>> much shows a failure of something. >>> >>> ?Barry >>> >>> On Feb 23, 2010, at 9:31 AM, Jed Brown wrote: >>> >>>> Matt and I talked about this a couple months ago, but I'd like to also >>>> mention it here. ?It seems to me that data formats like HDF5 are really >>>> a pain to use for generic purposes, because you end up trying to map a >>>> directed graph of object relations (composition) into a hierarchical >>>> data format, and then implement relational queries on top of this >>>> hierarchy. ?(I've done this, to some extent, and I ended up writing >>>> cumbersome code to walk this hierarchy to answer queries that would be >>>> one-line SQL queries.) >>>> >>>> To elaborate slightly on the problem, the goal would be to write vectors >>>> living on a DMComposite, with extra semantics like time step and units, >>>> in a way that could be used for visualization as well as checkpoints for >>>> forward and adjoint models. ?PETSc's unadorned binary IO is fine if the >>>> same code is going to read it back in, because everything will be wired >>>> up correctly and we're just loading into a Vec (although it's already >>>> somewhat tricky when the layout changes in the unstructured case). ?But >>>> there just isn't enough metadata to operate on in any sort of generic >>>> way, and I hate writing custom code to describe meshes and relations >>>> between them. >>>> >>>> Current scientific data formats (at least those I have seen) are a >>>> hassle to use since they have poor support for expressing relations. >>>> HDF5 has the equivalent of file-system symlinks, but after >>>> normalization, all the relations end up being encoded as a bunch of >>>> symlinks, which is a relatively low-level view and isn't a particularly >>>> convenient thing to traverse when answering a query. >>>> >>>> So I'm curious if anyone has put such metadata into a relational >>>> database instead of trying to contort it into one of these "scientific" >>>> data formats. ?My thought would be to drop only the metadata into >>>> something like Sqlite, and write the arrays themselves using MPI-IO (or >>>> HDF5/NetCDF/whatever, but these don't provide much when we aren't using >>>> them for metadata). ?This would allow efficient support of queries like >>>> "all vector fields at step M" and "fields B and C from step M to N on >>>> subdomains intersecting bounding box XYZ". ?This isn't completely >>>> different from what XDMF tries to do, but experimentation with that left >>>> a sour taste. ?Is SQL a stupid idea for this purpose and I'd be better >>>> off writing code to support the queries I want on HDF5/XDMF/something >>>> else? >>>> >>>> Jed >>> >>> > >
