Hello,

I have a container running on OVH Cloud Storage fronted by S3QL. All 
working very nicely, thank you.

Within the S3QL filesystem I have three or so top level directories storing 
mostly different types of files. On the assumption that there is not a lot 
of commonality between these three top level directories is there any way 
to estimate the corresponding amount of backend usage?

The reason I would like to find out is that one of these top level 
directories holds really old archive copies of data that we could afford to 
lose but would currently prefer to keep. Unfortunately at the moment we 
have no way of determining the cost of keeping them available so it's 
impossible to measure the value vs cost.

If it helps here are examples of our top-level directories

- "archive", holds a variety of "really old" items, which we'd like to 
measure
- "databases", holds about 10 copies each of around 20 databases in a 
highly compressed backup format. (I'm assuming these don't de-dupe well.)
- "rsnapshot", holds about 20 subdirectories containing highly duplicated 
data across another six or so further subdirectories each


One approach I've considered is to measure the front-facing quantity of 
data per directory tree (i.e. measure a duplicated file within "rsnapshot" 
just once), and then split the backend usage proportionately. The 
difficulty with this is that identical files in the "rsnapshot" directory 
are not necessarily hard-linked as they would have been in my local 
filesystem, so it could get messy identifying duplicates. Furthermore, I 
get charged for downloading data so running a checksum across the data 
contents is to be avoided if possible. This leaves me with file metadata as 
the sole method for identifying probable duplicates.

The reason I'm asking here is that I thought it might be possible to query 
the S3QL database for this information, and that this could be considerably 
faster than trawling through all the source files' metadata via the FUSE 
interface.

Thank you for any insights.
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to