-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Aidan,
It seems to me that you don't want to store the actual data (files), but the Merkle tree of each version of them. That way you could identify the differing parts of any two file (version) easily. Perkeep does this, but you want the hashes for the leaves, not the data... I don't know a solution, but Perkeep does almost everything for you, except the "store the hashes, not the data" part: file metadata, hash tree, immutability and so on. Maybe a pk-put wrapper which pushes the file's metadata with the Adler32 hashed block hashes instead of the real data? Tamás Gulácsi ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On 2018. December 8., Saturday 11:15, Aidan Heerdegen <[email protected]> wrote: > Hi Kevin, > > > This is an interesting question. I'm in computing for high energy physics, > > where there is custom software used for tracking the data files, their > > locations, etc. I hadn't considered the application of Perkeep in this area > > (I've only toyed with it as a personal project). For tracking where the > > data is the HEP community has been moving to Rucio https://rucio.cern.ch/. > > The metadata, however, is still a bit of wildcard-Rucio doesn't currently > > have a native metadata store. I'm interested in seeing if Perkeep can be > > offered as a solution here, but if not I may be able to provide an > > alternative. > > Thanks for the pointer to rucio, looks interesting with some great features. > > Our issue is that we really don't have anything like rucio on offer (though I > will ask, nothing ever happens quickly, and I'm not holding my breath). > > So I'd like to be able to have at least some way of knowing where a file has > been, and what relationship it has to ancestors/siblings/parents. > > Nothing like a concrete example: > > I support Ocean modellers (and some coupled climate modelling). The models > have many input files. For example, we have an input file that specifies sea > surface salinity which is derived from observations. Now at some point it may > be that there is found to be an error in that file, or perhaps just an > update. So we start using the updated file, and we have ways of tracking the > hash of inputs to our experiment, so we can uniquely identify the file we > used, and the last location it was accessed. But in comparing experiments > before and after, with different sea surface salinity files, we don't know > the relationship between them. Clearly we should have noted this in the > metadata of the updated file, but we all know any system that relies on > people to do the right thing will inevitably break down. > > Ideally if we needed to retrieve the original input file we could find it in > the location from which it was last used, but if it isn't there, where has it > gone? Now I'm in the process of writing a little something to dump some of > this information to a sqlite database when we archive files to tape storage, > so this could be queried for matches to the hash of interest. But this is a > number of disparate systems all with a little bit of metadata and some way of > tying them all together. It seemed like perkeep might be a good way of > coalescing most of it into a single, flexible system. > > Cheers > > Aidan > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > You received this message because you are subscribed to the Google Groups > "Perkeep" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -----BEGIN PGP SIGNATURE----- Version: ProtonMail Comment: https://protonmail.com wsBcBAEBCAAQBQJcDCVHCRC4+9AbKq9fbQAAOOUH/R+CxLSKTlF57+IuK0XD 6g2UoRDSzyeZ1ojC2XocgFMohwHE7YoFL21YWXb0pMcS/IOPIsHrZu6OeEKD 4LAUuzWrCvf7SMbkAaXLUZyOhEGZbS7IeatKGrm7sLoOfPRYyVRcy3qA3a07 FeMf1/grN6GzXcyX0vpXUDaiqvTIIxKEGLftuad+0Y5X+EB7FbnyuzW6s5QW oKIYLdLzSXoS21rL4CktYwZPcdw/Qsx9O6upCknWYdYnglcfqP7DI4nkA1o3 Mar42b4Vx+4acL1dnXDVAwQZbtWoz8zzT6rnKZO/vccZJ5MW+l++rbn8vrJd R6eCwGgHVDO/mrSN5fvHbjA= =CQPc -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
publickey - [email protected] - 0x78A4747D.asc
Description: application/pgp-keys
publickey - [email protected] - 0x78A4747D.asc.sig
Description: PGP signature
