-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Aidan,

It seems to me that you don't want to store the actual data (files),
but the Merkle tree of each version of them.
That way you could identify the differing parts of any two file (version) 
easily.
Perkeep does this, but you want the hashes for the leaves, not the data...

I don't know a solution, but Perkeep does almost everything for you, except the 
"store the hashes, not the data" part: file metadata, hash tree, immutability 
and so on.

Maybe a pk-put wrapper which pushes the file's metadata with the Adler32 hashed 
block hashes instead of the real data?

Tamás Gulácsi




‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On 2018. December 8., Saturday 11:15, Aidan Heerdegen 
<[email protected]> wrote:

> Hi Kevin,
>
> > This is an interesting question. I'm in computing for high energy physics, 
> > where there is custom software used for tracking the data files, their 
> > locations, etc. I hadn't considered the application of Perkeep in this area 
> > (I've only toyed with it as a personal project). For tracking where the 
> > data is the HEP community has been moving to Rucio https://rucio.cern.ch/. 
> > The metadata, however, is still a bit of wildcard-Rucio doesn't currently 
> > have a native metadata store. I'm interested in seeing if Perkeep can be 
> > offered as a solution here, but if not I may be able to provide an 
> > alternative.
>
> Thanks for the pointer to rucio, looks interesting with some great features.
>
> Our issue is that we really don't have anything like rucio on offer (though I 
> will ask, nothing ever happens quickly, and I'm not holding my breath).
>
> So I'd like to be able to have at least some way of knowing where a file has 
> been, and what relationship it has to ancestors/siblings/parents.
>
> Nothing like a concrete example:
>
> I support Ocean modellers (and some coupled climate modelling). The models 
> have many input files. For example, we have an input file that specifies sea 
> surface salinity which is derived from observations. Now at some point it may 
> be that there is found to be an error in that file, or perhaps just an 
> update. So we start using the updated file, and we have ways of tracking the 
> hash of inputs to our experiment, so we can uniquely identify the file we 
> used, and the last location it was accessed. But in comparing experiments 
> before and after, with different sea surface salinity files, we don't know 
> the relationship between them. Clearly we should have noted this in the 
> metadata of the updated file, but we all know any system that relies on 
> people to do the right thing will inevitably break down.
>
> Ideally if we needed to retrieve the original input file we could find it in 
> the location from which it was last used, but if it isn't there, where has it 
> gone? Now I'm in the process of writing a little something to dump some of 
> this information to a sqlite database when we archive files to tape storage, 
> so this could be queried for matches to the hash of interest. But this is a 
> number of disparate systems all with a little bit of metadata and some way of 
> tying them all together. It seemed like perkeep might be a good way of 
> coalescing most of it into a single, flexible system.
>
> Cheers
>
> Aidan
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> You received this message because you are subscribed to the Google Groups 
> "Perkeep" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.


-----BEGIN PGP SIGNATURE-----
Version: ProtonMail
Comment: https://protonmail.com

wsBcBAEBCAAQBQJcDCVHCRC4+9AbKq9fbQAAOOUH/R+CxLSKTlF57+IuK0XD
6g2UoRDSzyeZ1ojC2XocgFMohwHE7YoFL21YWXb0pMcS/IOPIsHrZu6OeEKD
4LAUuzWrCvf7SMbkAaXLUZyOhEGZbS7IeatKGrm7sLoOfPRYyVRcy3qA3a07
FeMf1/grN6GzXcyX0vpXUDaiqvTIIxKEGLftuad+0Y5X+EB7FbnyuzW6s5QW
oKIYLdLzSXoS21rL4CktYwZPcdw/Qsx9O6upCknWYdYnglcfqP7DI4nkA1o3
Mar42b4Vx+4acL1dnXDVAwQZbtWoz8zzT6rnKZO/vccZJ5MW+l++rbn8vrJd
R6eCwGgHVDO/mrSN5fvHbjA=
=CQPc
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Attachment: publickey - [email protected] - 0x78A4747D.asc
Description: application/pgp-keys

Attachment: publickey - [email protected] - 0x78A4747D.asc.sig
Description: PGP signature

Reply via email to