[
https://issues.apache.org/jira/browse/HDDS-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117478#comment-17117478
]
Marton Elek commented on HDDS-3658:
-----------------------------------
Thanks to open this issue [~Sammi]. Based on my understanding the plan is to
remove the pipeline information (datanode id / ip / host). But it's not clear
from the description as there are no pipeline informations in the example.
Can you please clarify what is the "container location info" which will be
removed.
(If the datanode / pipeline information: I am very happy to remove it as the
refresh is required for correctness, IMHO)
> Remove container location information when persist key info into OM DB to
> reduce meta data db size
> --------------------------------------------------------------------------------------------------
>
> Key: HDDS-3658
> URL: https://issues.apache.org/jira/browse/HDDS-3658
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
>
> An investigation result of serilized key size, RATIS with three replica.
> 1. empty key, serilized size 113 bytes
> hadoop/bucket/user/root/terasort/10G-input-7/_SUCCESS
> {
> "volumeName" : "hadoop",
> "bucketName" : "bucket",
> "name" : "user/root/terasort/10G-input-7/_SUCCESS",
> "dataSize" : 0,
> "creationTime" : "2019-11-21T13:53:11.330Z",
> "modificationTime" : "2019-11-21T13:53:11.361Z",
> "replicationType" : "RATIS",
> "replicationFactor" : 3,
> "ozoneKeyLocations" : [ ],
> "metadata" : { },
> "fileEncryptionInfo" : null
> }
> 2. key with one chunk data, serilized size 661 bytes
> hadoop/bucket/user/root/terasort/10G-input-6/part-m-00037
> {
> "volumeName" : "hadoop",
> "bucketName" : "bucket",
> "name" : "user/root/terasort/10G-input-6/part-m-00037",
> "dataSize" : 223696200,
> "creationTime" : "2019-11-18T07:47:58.254Z",
> "modificationTime" : "2019-11-18T07:53:52.066Z",
> "replicationType" : "RATIS",
> "replicationFactor" : 3,
> "ozoneKeyLocations" : [ {
> "containerID" : 7,
> "localID" : 103157811003588713,
> "length" : 223696200,
> "offset" : 0
> } ],
> "metadata" : { },
> "fileEncryptionInfo" : null
> }
> 3. key with two chunk data, serilized size 1205 bytes,
> ozone sh key info hadoop/bucket/user/root/terasort/10G-input-7/part-m-00027
> {
> "volumeName" : "hadoop",
> "bucketName" : "bucket",
> "name" : "user/root/terasort/10G-input-7/part-m-00027",
> "dataSize" : 223696200,
> "creationTime" : "2019-11-21T13:47:07.653Z",
> "modificationTime" : "2019-11-21T13:53:07.964Z",
> "replicationType" : "RATIS",
> "replicationFactor" : 3,
> "ozoneKeyLocations" : [ {
> "containerID" : 221,
> "localID" : 103176210196201501,
> "length" : 134217728,
> "offset" : 0
> }, {
> "containerID" : 222,
> "localID" : 103176231767375926,
> "length" : 89478472,
> "offset" : 0
> } ],
> "metadata" : { },
> "fileEncryptionInfo" : null
> }
> When client reads a key, there is "refreshPipeline" option to control whether
> to get the up-to-date container location infofrom SCM.
> Currently, this option is always set to true, which makes saved container
> location info in OM DB useless.
> Another motivation is when using Nanda's tool for the OM performance test,
> with 1000 millions(1Billion) keys, each key with 1 replica, 2 chunk meta
> data, the total rocks DB directory size is 65.5GB. One of our customer
> cluster has the requirement to save 10 Billion objects. In this case ,the DB
> size is approximately (65.5GB * 10 * /2 * 3 )~ 1TB.
> The goal of this task is going to discard the container location info when
> persist key to OM DB to save the DB space.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]