[
https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davide Giannella updated OAK-6254:
----------------------------------
Fix Version/s: 1.14.0
> DataStore: API to retrieve approximate storage size
> ---------------------------------------------------
>
> Key: OAK-6254
> URL: https://issues.apache.org/jira/browse/OAK-6254
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: blob
> Reporter: Thomas Mueller
> Priority: Major
> Fix For: 1.12.0, 1.14.0
>
>
> The estimated size of the datastore (on disk) is needed to:
> * monitor growth over time, or growth of certain operations
> * monitor if garbage collection is effective
> * avoid out of disk space
> * estimate backup size
> * statistical purposes (for example, if there are many repositories, to group
> them by size)
> Datastore size: we could use the following heuristic: We could read the file
> sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or
> ./datastore/00 and multiply by 256. That would give a rough estimation
> (within about 20% for repositories with datastore size > 50 GB).
> I think this is mainly important for the FileDataStore. The S3 datastore, if
> there is a simple and fast S3 API to read the size, then that would be good
> as well, but if there is none, then returning "unknown" is fine for me.
> As for the API, I would use something like this: {{long
> getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for
> inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very
> slow). Similar to
> [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)].
> I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds
> for accuracyLevel 1, and possibly hours for level 9. With level 1, I would
> read files in 00/00, with level 2 - 8 I would read files in 00, and with
> level 9 I would read all the files. For level 1, I wouldn't stop; for level
> 2, if it takes more than 5 seconds, I would stop and return the current best
> estimate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)