Hi Kevin, > This is an interesting question. I'm in computing for high energy physics, > where there is custom software used for tracking the data files, their > locations, etc. I hadn't considered the application of Perkeep in this area > (I've only toyed with it as a personal project). For tracking where the data > is the HEP community has been moving to Rucio https://rucio.cern.ch/. The > metadata, however, is still a bit of wildcard-Rucio doesn't currently have a > native metadata store. I'm interested in seeing if Perkeep can be offered as > a solution here, but if not I may be able to provide an alternative.
Thanks for the pointer to rucio, looks interesting with some great features. Our issue is that we really don't have anything like rucio on offer (though I will ask, nothing ever happens quickly, and I'm not holding my breath). So I'd like to be able to have at least some way of knowing where a file has been, and what relationship it has to ancestors/siblings/parents. Nothing like a concrete example: I support Ocean modellers (and some coupled climate modelling). The models have many input files. For example, we have an input file that specifies sea surface salinity which is derived from observations. Now at some point it may be that there is found to be an error in that file, or perhaps just an update. So we start using the updated file, and we have ways of tracking the hash of inputs to our experiment, so we can uniquely identify the file we used, and the last location it was accessed. But in comparing experiments before and after, with different sea surface salinity files, we don't know the relationship between them. Clearly we should have noted this in the metadata of the updated file, but we all know any system that relies on people to do the right thing will inevitably break down. Ideally if we needed to retrieve the original input file we could find it in the location from which it was last used, but if it isn't there, where has it gone? Now I'm in the process of writing a little something to dump some of this information to a sqlite database when we archive files to tape storage, so this could be queried for matches to the hash of interest. But this is a number of disparate systems all with a little bit of metadata and some way of tying them all together. It seemed like perkeep might be a good way of coalescing most of it into a single, flexible system. Cheers Aidan -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
