Persistent meta-data store for scientific data

Aidan Heerdegen Tue, 04 Dec 2018 15:01:40 -0800

Hi,

I have a use case where I specifically do not want to store my data files.


They are scientific datasets, usually model output, which might total 
hundreds of TB or more. 

What I would like to do is store the meta-data from the datasets, their 
location(s) and some transactional information, if they are moved, deleted 
etc.

It is essential that the original data can be deleted, in some cases it is 
no longer required, and in others it might be backed up to a slow to access 
tape based data silo.

Each user would have their own perkeep store, but the ability to 
coalesce/share the information in those stores would be almost essential.

Does this sound like a use case for perkeep? I like the idea of keeping ALL 
my metadata (it is pretty small), using it to find files with particular 
characteristics, retrieve them from a backup storage location, that sort of 
thing.

If perkeep is a good match, can anyone suggest what "mappings" I might need 
to think about in perkeep terms? e.g.

A permanode would be required for each unique file instance? I would be 
storing an identifying hash of some sort to ensure it was unique. 

If a file is modified such that the hash changes, I would need a new 
permanode, but would like to keep a relationship between the two files.

Most data files are netCDF, so I would like to dump the metadata as a JSON 
blob and associate that data with a permanode. I guess perkeep has existing 
methods for dealing with JSON? And indexing/searching it?

Thanks very much for any help,

Cheers

Aidan

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Persistent meta-data store for scientific data

Reply via email to