On Wed, Feb 24, 2010 at 12:59 AM, Jed Brown <jed at 59a2.org> wrote: > On Tue, 23 Feb 2010 13:44:55 -0600, Dmitry Karpeev <karpeev at mcs.anl.gov> > wrote: >> Yes, but what about using Spotlight programmatically (e.g., from >> PETSc) to store rich state, checkpointing, etc? ?For example, I want >> to store a Vec. ?How do I label it? ?There maybe various user contexts >> that share it, so I'd like to label it with all of them. > > Right, so I think SQL is one way to formalize this. ?Ad-hoc indexing is > great for interactive use, but this system needs to be deterministic and > have somewhat more precise semantics. ?This is not to say that a generic > indexer could not be used, but I think it would end up being difficult > to maintain certain invariants since the schema would end up being > encoded in conventions. ?Other database paradigms may also be fine, but > the point of NoSQL is typically to weaken the supported queries and > guarantees about concurrent modification in exchange for improved > throughput/scalability.
Yes, I think SQL or some such approach would be a good solution. I don't even think the actual file format matters too much: we can just create collections of files that share keys. The database is needed only to manage file names. It could also store other data, of course, but that's just gravy. > >> In a way, I don't to have to look at my home directory (or any >> directory) at all. ?I just want to extract files based on a given (set >> of) label(s). > > "Labels", in the gmail sense are difficult to maintain, and I mostly use > them as aliases for more sophisticated searches (by writing filters). > Keeping labels distinct from the filters that define them is really just > rubbing a particular caching scheme in everyone's face (manually applied > labels are useful for workflow). ?But I get your point, the hierarchy is > just selecting one organizational scheme as special, and hard/symlinks > are band-aids to permit sharing and one-way relations existing outside > the hierarchy. ?Good tools recognize this (web search, distributed SCMs, > code navigation, gmail/notmuch, filesystem indexers). Yes, labels are cumbersome, since they have to be create manually, etc. However, when we decide where on the filesystem to place a file, we are essentially selecting its labels: the directories on the path. At least those are *some* of the labels we'd like to attach to the file and the filesystem only allows "labels" encoded as directories. I agree that it would be nice to allow more general queries, but based on what (permissions, timestamp? those sound like natural candidates)? Dmitry. > > Jed >
