Hello, I have a container running on OVH Cloud Storage fronted by S3QL. All working very nicely, thank you.
Within the S3QL filesystem I have three or so top level directories storing mostly different types of files. On the assumption that there is not a lot of commonality between these three top level directories is there any way to estimate the corresponding amount of backend usage? The reason I would like to find out is that one of these top level directories holds really old archive copies of data that we could afford to lose but would currently prefer to keep. Unfortunately at the moment we have no way of determining the cost of keeping them available so it's impossible to measure the value vs cost. If it helps here are examples of our top-level directories - "archive", holds a variety of "really old" items, which we'd like to measure - "databases", holds about 10 copies each of around 20 databases in a highly compressed backup format. (I'm assuming these don't de-dupe well.) - "rsnapshot", holds about 20 subdirectories containing highly duplicated data across another six or so further subdirectories each One approach I've considered is to measure the front-facing quantity of data per directory tree (i.e. measure a duplicated file within "rsnapshot" just once), and then split the backend usage proportionately. The difficulty with this is that identical files in the "rsnapshot" directory are not necessarily hard-linked as they would have been in my local filesystem, so it could get messy identifying duplicates. Furthermore, I get charged for downloading data so running a checksum across the data contents is to be avoided if possible. This leaves me with file metadata as the sole method for identifying probable duplicates. The reason I'm asking here is that I thought it might be possible to query the S3QL database for this information, and that this could be considerably faster than trawling through all the source files' metadata via the FUSE interface. Thank you for any insights. Chris -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
