On Jun 03 2016, Chris Davies <[email protected]> wrote:
> Within the S3QL filesystem I have three or so top level directories storing
> mostly different types of files. On the assumption that there is not a lot
> of commonality between these three top level directories is there any way
> to estimate the corresponding amount of backend usage?
>
[...]
>
> One approach I've considered is to measure the front-facing quantity of
> data per directory tree (i.e. measure a duplicated file within "rsnapshot"
> just once), and then split the backend usage proportionately. The
> difficulty with this is that identical files in the "rsnapshot" directory
> are not necessarily hard-linked as they would have been in my local
> filesystem, so it could get messy identifying duplicates. Furthermore, I
> get charged for downloading data so running a checksum across the data
> contents is to be avoided if possible. This leaves me with file metadata as
> the sole method for identifying probable duplicates.
>
> The reason I'm asking here is that I thought it might be possible to query
> the S3QL database for this information, and that this could be considerably
> faster than trawling through all the source files' metadata via the
> FUSE interface.
Yes, this information is easily extracted from the SQLite
database. Start by mapping the names of your top-level directories to
"name ids", then look them up in the inode table (the parent inode
should be zero). You can then recursively determine all inodes in each
subtree.
The inode_blocks table tells you what storage blocks are associated with
each inode. At this point you can identify de-duplicated blocks (they
will be referenced by multiple inodes). At the moment, each block is
associated with exactly one storage object. To determine the compressed
size, you can thus just look at the corresponding entry in the "objects"
table.
Best,
Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
--
You received this message because you are subscribed to the Google Groups
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.