[
https://issues.apache.org/jira/browse/HDDS-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124522#comment-17124522
]
Sammi Chen commented on HDDS-3658:
----------------------------------
Hi [~elek], the example is copied from the "ozone sh key info" command output.
It actually will not show the pipeline information for each key location so
fart. I will add more description to clarify the goal which you understand
100% correctly.
> Remove container location information when persist key info into OM DB to
> reduce meta data db size
> --------------------------------------------------------------------------------------------------
>
> Key: HDDS-3658
> URL: https://issues.apache.org/jira/browse/HDDS-3658
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
>
> An investigation result of serilized key size, RATIS with three replica.
> 1. empty key, serilized size 113 bytes
> hadoop/bucket/user/root/terasort/10G-input-7/_SUCCESS
> {
> "volumeName" : "hadoop",
> "bucketName" : "bucket",
> "name" : "user/root/terasort/10G-input-7/_SUCCESS",
> "dataSize" : 0,
> "creationTime" : "2019-11-21T13:53:11.330Z",
> "modificationTime" : "2019-11-21T13:53:11.361Z",
> "replicationType" : "RATIS",
> "replicationFactor" : 3,
> "ozoneKeyLocations" : [ ],
> "metadata" : { },
> "fileEncryptionInfo" : null
> }
> 2. key with one chunk data, serilized size 661 bytes
> hadoop/bucket/user/root/terasort/10G-input-6/part-m-00037
> {
> "volumeName" : "hadoop",
> "bucketName" : "bucket",
> "name" : "user/root/terasort/10G-input-6/part-m-00037",
> "dataSize" : 223696200,
> "creationTime" : "2019-11-18T07:47:58.254Z",
> "modificationTime" : "2019-11-18T07:53:52.066Z",
> "replicationType" : "RATIS",
> "replicationFactor" : 3,
> "ozoneKeyLocations" : [ {
> "containerID" : 7,
> "localID" : 103157811003588713,
> "length" : 223696200,
> "offset" : 0
> } ],
> "metadata" : { },
> "fileEncryptionInfo" : null
> }
> 3. key with two chunk data, serilized size 1205 bytes,
> ozone sh key info hadoop/bucket/user/root/terasort/10G-input-7/part-m-00027
> {
> "volumeName" : "hadoop",
> "bucketName" : "bucket",
> "name" : "user/root/terasort/10G-input-7/part-m-00027",
> "dataSize" : 223696200,
> "creationTime" : "2019-11-21T13:47:07.653Z",
> "modificationTime" : "2019-11-21T13:53:07.964Z",
> "replicationType" : "RATIS",
> "replicationFactor" : 3,
> "ozoneKeyLocations" : [ {
> "containerID" : 221,
> "localID" : 103176210196201501,
> "length" : 134217728,
> "offset" : 0
> }, {
> "containerID" : 222,
> "localID" : 103176231767375926,
> "length" : 89478472,
> "offset" : 0
> } ],
> "metadata" : { },
> "fileEncryptionInfo" : null
> }
> When client reads a key, there is "refreshPipeline" option to control whether
> to get the up-to-date container location infofrom SCM.
> Currently, this option is always set to true, which makes saved container
> location info in OM DB useless.
> Another motivation is when using Nanda's tool for the OM performance test,
> with 1000 millions(1Billion) keys, each key with 1 replica, 2 chunk meta
> data, the total rocks DB directory size is 65.5GB. One of our customer
> cluster has the requirement to save 10 Billion objects. In this case ,the DB
> size is approximately (65.5GB * 10 * /2 * 3 )~ 1TB.
> The goal of this task is going to discard the container location info when
> persist key to OM DB to save the DB space.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]