[jira] [Reopened] (HDDS-4378) Ozone shell support truncate API
[ https://issues.apache.org/jira/browse/HDDS-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang reopened HDDS-4378: -- > Ozone shell support truncate API > > > Key: HDDS-4378 > URL: https://issues.apache.org/jira/browse/HDDS-4378 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4378) Ozone shell support truncate API
[ https://issues.apache.org/jira/browse/HDDS-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang resolved HDDS-4378. -- Assignee: runzhiwang Resolution: Fixed > Ozone shell support truncate API > > > Key: HDDS-4378 > URL: https://issues.apache.org/jira/browse/HDDS-4378 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4382) SCM send truncate block to datanode
[ https://issues.apache.org/jira/browse/HDDS-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4382: - Description: SCM starts a background thread to send TruncateBlocksCommandProto to the datanode. > SCM send truncate block to datanode > --- > > Key: HDDS-4382 > URL: https://issues.apache.org/jira/browse/HDDS-4382 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > SCM starts a background thread to send TruncateBlocksCommandProto to the > datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4379) OM marks truncated blocks
[ https://issues.apache.org/jira/browse/HDDS-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4379: - Description: Append mean ozone client can append some content to the tail of some key. So if Client1 truncates the file but datanode has not truncated blockN -> Client2 appends some content to the tail of blockN -> Datanode truncates blockN, then error happens. To avoid this, OmKeyLocationInfo add a new flag, i.e. toBeTruncated, to mark the block need to be truncated in the future. When Client2 appends, Client2 finds blockN with toBeTruncated flag, Client2 will allocate a new block, and append the content to the new block. > OM marks truncated blocks > - > > Key: HDDS-4379 > URL: https://issues.apache.org/jira/browse/HDDS-4379 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > Append mean ozone client can append some content to the tail of some key. So > if Client1 truncates the file but datanode has not truncated blockN -> > Client2 appends some content to the tail of blockN -> Datanode truncates > blockN, then error happens. To avoid this, OmKeyLocationInfo add a new flag, > i.e. toBeTruncated, to mark the block need to be truncated in the future. > When Client2 appends, Client2 finds blockN with toBeTruncated flag, Client2 > will allocate a new block, and append the content to the new block. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4383) Datanode truncates blocks on the disk
[ https://issues.apache.org/jira/browse/HDDS-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4383: - Description: Datanode starts a background thread to process . If it is FilePerBlock, we use FileChannel.truncate to truncate file to newLength directly. If it is FilePerChunk, we delete the files of the fully truncated chunks, and use FileChannel.truncate to process the partially truncated file. Then, in RocksDB, Datanode delete , and put . > Datanode truncates blocks on the disk > - > > Key: HDDS-4383 > URL: https://issues.apache.org/jira/browse/HDDS-4383 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > Datanode starts a background thread to process newLength>. If it is FilePerBlock, we use FileChannel.truncate to truncate > file to newLength directly. If it is FilePerChunk, we delete the files of the > fully truncated chunks, and use FileChannel.truncate to process the partially > truncated file. Then, in RocksDB, Datanode delete newLength>, and put . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4379) OM marks truncated blocks
[ https://issues.apache.org/jira/browse/HDDS-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4379: - Summary: OM marks truncated blocks (was: Mark truncated blocks in OM) > OM marks truncated blocks > - > > Key: HDDS-4379 > URL: https://issues.apache.org/jira/browse/HDDS-4379 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4383) Datanode truncates blocks on the disk
runzhiwang created HDDS-4383: Summary: Datanode truncates blocks on the disk Key: HDDS-4383 URL: https://issues.apache.org/jira/browse/HDDS-4383 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4377) Datanode truncates blocks on the disk
[ https://issues.apache.org/jira/browse/HDDS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4377: - Description: When Datanode receives TruncateBlocksCommand, in the RocksDB, Datanode deletes the chunks which are fully truncated, and updates the chunk length and checksum which is partially truncated. Datanode puts in RocksDB, and returns succ to SCM. > Datanode truncates blocks on the disk > - > > Key: HDDS-4377 > URL: https://issues.apache.org/jira/browse/HDDS-4377 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > When Datanode receives TruncateBlocksCommand, in the RocksDB, Datanode > deletes the chunks which are fully truncated, and updates the chunk length > and checksum which is partially truncated. Datanode puts newLength> in RocksDB, and returns succ to SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4377) Datanode changes the block length in rocksdb
[ https://issues.apache.org/jira/browse/HDDS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4377: - Summary: Datanode changes the block length in rocksdb (was: Datanode truncates blocks on the disk) > Datanode changes the block length in rocksdb > > > Key: HDDS-4377 > URL: https://issues.apache.org/jira/browse/HDDS-4377 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > When Datanode receives TruncateBlocksCommand, in the RocksDB, Datanode > deletes the chunks which are fully truncated, and updates the chunk length > and checksum which is partially truncated. Datanode puts newLength> in RocksDB, and returns succ to SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4382) SCM send truncate block to datanode
runzhiwang created HDDS-4382: Summary: SCM send truncate block to datanode Key: HDDS-4382 URL: https://issues.apache.org/jira/browse/HDDS-4382 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4376) SCM create transaction for truncated blocks
[ https://issues.apache.org/jira/browse/HDDS-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4376: - Description: When scm receives TruncateScmKeyRequestProto, for the deleteBlocks, scm has already implemented the code. For the partialTruncateBlocks, we process it as a transaction like deleteBlocks, store in truncatedBlocksTable, and return succ to OM. We abstract the code related to delete block transaction, so that truncate and delete blocks can share the abstract code. > SCM create transaction for truncated blocks > --- > > Key: HDDS-4376 > URL: https://issues.apache.org/jira/browse/HDDS-4376 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > > When scm receives TruncateScmKeyRequestProto, for the deleteBlocks, scm has > already implemented the code. > For the partialTruncateBlocks, we process it as a transaction like > deleteBlocks, store in > truncatedBlocksTable, and return succ to OM. > We abstract the code related to delete block transaction, so that truncate > and delete blocks can share the abstract code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4381) OM send truncate key to SCM
[ https://issues.apache.org/jira/browse/HDDS-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4381: - Description: OM starts a background thread to select a certain number of entries from truncateTable, and sends these entries to SCM by the TruncateScmKeyRequestProto. > OM send truncate key to SCM > --- > > Key: HDDS-4381 > URL: https://issues.apache.org/jira/browse/HDDS-4381 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > OM starts a background thread to select a certain number of entries from > truncateTable, and sends these entries to SCM by the > TruncateScmKeyRequestProto. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4381) OM send truncate key to SCM
runzhiwang created HDDS-4381: Summary: OM send truncate key to SCM Key: HDDS-4381 URL: https://issues.apache.org/jira/browse/HDDS-4381 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4380) OM stores truncate key in truncateTable
runzhiwang created HDDS-4380: Summary: OM stores truncate key in truncateTable Key: HDDS-4380 URL: https://issues.apache.org/jira/browse/HDDS-4380 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4380) OM stores truncate key in truncateTable
[ https://issues.apache.org/jira/browse/HDDS-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4380: - Description: OM stores in truncateTable. The TruncateKey in RepeatedTruncateOmKeyInfo represents one truncate key operation, so the list of TruncateKey allows us to store a list of truncate operations related to one key. > OM stores truncate key in truncateTable > --- > > Key: HDDS-4380 > URL: https://issues.apache.org/jira/browse/HDDS-4380 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Priority: Major > > OM stores in truncateTable. The TruncateKey > in RepeatedTruncateOmKeyInfo represents one truncate key operation, so the > list of TruncateKey allows us to store a list of truncate operations related > to one key. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4375) OM changes the block length when receives truncate request
[ https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4375: - Description: When OM receives truncate(key, newLength),in the keyTable, OM deletes the blocks which are fully truncated, and updates the block length which is partially truncated, then return success to client. > OM changes the block length when receives truncate request > -- > > Key: HDDS-4375 > URL: https://issues.apache.org/jira/browse/HDDS-4375 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > > When OM receives truncate(key, newLength),in the keyTable, OM deletes the > blocks which are fully truncated, and updates the block length which is > partially truncated, then return success to client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4375) OM changes the block length when receives truncate request
[ https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4375: - Summary: OM changes the block length when receives truncate request (was: OM changes the block length when receive truncate request) > OM changes the block length when receives truncate request > -- > > Key: HDDS-4375 > URL: https://issues.apache.org/jira/browse/HDDS-4375 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4375) OM changes the block length when receive truncate request
[ https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4375: - Summary: OM changes the block length when receive truncate request (was: Client asks OM to change the block length when truncate) > OM changes the block length when receive truncate request > - > > Key: HDDS-4375 > URL: https://issues.apache.org/jira/browse/HDDS-4375 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4379) Mark truncated blocks in OM
runzhiwang created HDDS-4379: Summary: Mark truncated blocks in OM Key: HDDS-4379 URL: https://issues.apache.org/jira/browse/HDDS-4379 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4378) Ozone shell support truncate API
runzhiwang created HDDS-4378: Summary: Ozone shell support truncate API Key: HDDS-4378 URL: https://issues.apache.org/jira/browse/HDDS-4378 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4376) SCM create transaction for truncated blocks
runzhiwang created HDDS-4376: Summary: SCM create transaction for truncated blocks Key: HDDS-4376 URL: https://issues.apache.org/jira/browse/HDDS-4376 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4377) Datanode truncates blocks on the disk
runzhiwang created HDDS-4377: Summary: Datanode truncates blocks on the disk Key: HDDS-4377 URL: https://issues.apache.org/jira/browse/HDDS-4377 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4375) Client asks OM to change the block length when truncate
[ https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4375: - Parent: HDDS-4239 Issue Type: Sub-task (was: New Feature) > Client asks OM to change the block length when truncate > --- > > Key: HDDS-4375 > URL: https://issues.apache.org/jira/browse/HDDS-4375 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4375) Client asks OM to change the block length when truncate
runzhiwang created HDDS-4375: Summary: Client asks OM to change the block length when truncate Key: HDDS-4375 URL: https://issues.apache.org/jira/browse/HDDS-4375 Project: Hadoop Distributed Data Store Issue Type: New Feature Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Parent: (was: HDDS-3714) Issue Type: New Feature (was: Sub-task) > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: (was: Ozone Truncate Design-v3.pdf) > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: Ozone Truncate Design.pdf > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: (was: Ozone Truncate Design-v2.pdf) > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: (was: Ozone Truncate Design-v1.pdf) > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design-v2.pdf, Ozone Truncate > Design-v3.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: Ozone Truncate Design-v3.pdf > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design-v1.pdf, Ozone Truncate > Design-v2.pdf, Ozone Truncate Design-v3.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: Ozone Truncate Design-v2.pdf > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design-v1.pdf, Ozone Truncate > Design-v2.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4240) Ozone support append operation
runzhiwang created HDDS-4240: Summary: Ozone support append operation Key: HDDS-4240 URL: https://issues.apache.org/jira/browse/HDDS-4240 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate operation
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Summary: Ozone support truncate operation (was: Ozone support truncate) > Ozone support truncate operation > > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design-v1.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Attachment: Ozone Truncate Design-v1.pdf > Ozone support truncate > -- > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Ozone Truncate Design-v1.pdf > > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Description: Design: https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# > Ozone support truncate > -- > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > > Design: > https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4239) Ozone support truncate
[ https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4239: - Parent: HDDS-3714 Issue Type: Sub-task (was: New Feature) > Ozone support truncate > -- > > Key: HDDS-4239 > URL: https://issues.apache.org/jira/browse/HDDS-4239 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4239) Ozone support truncate
runzhiwang created HDDS-4239: Summary: Ozone support truncate Key: HDDS-4239 URL: https://issues.apache.org/jira/browse/HDDS-4239 Project: Hadoop Distributed Data Store Issue Type: New Feature Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3714) Ozone support append truncate operation
[ https://issues.apache.org/jira/browse/HDDS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang reassigned HDDS-3714: Assignee: runzhiwang > Ozone support append truncate operation > --- > > Key: HDDS-3714 > URL: https://issues.apache.org/jira/browse/HDDS-3714 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Manager >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4214) Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure
[ https://issues.apache.org/jira/browse/HDDS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang resolved HDDS-4214. -- Resolution: Duplicate > Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure > > > Key: HDDS-4214 > URL: https://issues.apache.org/jira/browse/HDDS-4214 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4214) Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure
runzhiwang created HDDS-4214: Summary: Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure Key: HDDS-4214 URL: https://issues.apache.org/jira/browse/HDDS-4214 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4202) Upgrade ratis to 1.1.0-ea949f1-SNAPSHOT
runzhiwang created HDDS-4202: Summary: Upgrade ratis to 1.1.0-ea949f1-SNAPSHOT Key: HDDS-4202 URL: https://issues.apache.org/jira/browse/HDDS-4202 Project: Hadoop Distributed Data Store Issue Type: New Feature Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4201) Improve the performance of OmKeyLocationInfoGroup
runzhiwang created HDDS-4201: Summary: Improve the performance of OmKeyLocationInfoGroup Key: HDDS-4201 URL: https://issues.apache.org/jira/browse/HDDS-4201 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
[ https://issues.apache.org/jira/browse/HDDS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4199: - Attachment: screenshot-1.png > Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache > > > Key: HDDS-4199 > URL: https://issues.apache.org/jira/browse/HDDS-4199 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
[ https://issues.apache.org/jira/browse/HDDS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4199: - Description: !screenshot-1.png! > Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache > > > Key: HDDS-4199 > URL: https://issues.apache.org/jira/browse/HDDS-4199 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
[ https://issues.apache.org/jira/browse/HDDS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang reassigned HDDS-4199: Assignee: runzhiwang > Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache > > > Key: HDDS-4199 > URL: https://issues.apache.org/jira/browse/HDDS-4199 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
runzhiwang created HDDS-4199: Summary: Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache Key: HDDS-4199 URL: https://issues.apache.org/jira/browse/HDDS-4199 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4187) Fix recon OOM
[ https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4187: - Summary: Fix recon OOM (was: Fix memory leak of recon) > Fix recon OOM > - > > Key: HDDS-4187 > URL: https://issues.apache.org/jira/browse/HDDS-4187 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > 40 datanodes with 400, 000 containers, start recon with xmx:10G. After > several hours, recon's memory increase to 12G and OOM. Memory leak happens on > heap, and the reason is recon is slow to process ContainerReport, so the > queue of thread OOM. > !screenshot-1.png! > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4187) Fix memory leak of recon
[ https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4187: - Description: 40 datanodes with 400, 000 containers, start recon with xmx:10G. After several hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and the reason is recon is slow to process ContainerReport, so the queue of thread OOM. !screenshot-1.png! !screenshot-2.png! was: 40 datanodes with 400, 000 containers, start recon with xmx:10G. After several hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and the reason is recon is slow to process ContainerReplicaReport, so the queue of thread OOM. !screenshot-1.png! !screenshot-2.png! > Fix memory leak of recon > > > Key: HDDS-4187 > URL: https://issues.apache.org/jira/browse/HDDS-4187 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > 40 datanodes with 400, 000 containers, start recon with xmx:10G. After > several hours, recon's memory increase to 12G and OOM. Memory leak happens on > heap, and the reason is recon is slow to process ContainerReport, so the > queue of thread OOM. > !screenshot-1.png! > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4187) Fix memory leak of recon
[ https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4187: - Description: 40 datanodes with 400, 000 containers, start recon with xmx:10G. After several hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and the reason is recon is slow to process ContainerReplicaReport, so the queue of thread OOM. !screenshot-1.png! !screenshot-2.png! was: 40 datanodes with 400, 000 containers, start recon with xmx:10G. After several hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and the reason is recon is slow to process ContainerReplicaReport, so the queue of thread OOM. > Fix memory leak of recon > > > Key: HDDS-4187 > URL: https://issues.apache.org/jira/browse/HDDS-4187 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > 40 datanodes with 400, 000 containers, start recon with xmx:10G. After > several hours, recon's memory increase to 12G and OOM. Memory leak happens on > heap, and the reason is recon is slow to process ContainerReplicaReport, so > the queue of thread OOM. > !screenshot-1.png! > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4187) Fix memory leak of recon
[ https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4187: - Attachment: screenshot-2.png > Fix memory leak of recon > > > Key: HDDS-4187 > URL: https://issues.apache.org/jira/browse/HDDS-4187 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > 40 datanodes with 400, 000 containers, start recon with xmx:10G. After > several hours, recon's memory increase to 12G and OOM. Memory leak happens on > heap, and the reason is recon is slow to process ContainerReplicaReport, so > the queue of thread OOM. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4187) Fix memory leak of recon
[ https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4187: - Attachment: screenshot-1.png > Fix memory leak of recon > > > Key: HDDS-4187 > URL: https://issues.apache.org/jira/browse/HDDS-4187 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > 40 datanodes with 400, 000 containers, start recon with xmx:10G. After > several hours, recon's memory increase to 12G and OOM. Memory leak happens on > heap, and the reason is recon is slow to process ContainerReplicaReport, so > the queue of thread OOM. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4187) Fix memory leak of recon
[ https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4187: - Description: 40 datanodes with 400, 000 containers, start recon with xmx:10G. After several hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and the reason is recon is slow to process ContainerReplicaReport, so the queue of thread OOM. > Fix memory leak of recon > > > Key: HDDS-4187 > URL: https://issues.apache.org/jira/browse/HDDS-4187 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > > 40 datanodes with 400, 000 containers, start recon with xmx:10G. After > several hours, recon's memory increase to 12G and OOM. Memory leak happens on > heap, and the reason is recon is slow to process ContainerReplicaReport, so > the queue of thread OOM. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4187) Fix memory leak of recon
runzhiwang created HDDS-4187: Summary: Fix memory leak of recon Key: HDDS-4187 URL: https://issues.apache.org/jira/browse/HDDS-4187 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2922) Balance ratis leader distribution in datanodes
[ https://issues.apache.org/jira/browse/HDDS-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-2922: - Summary: Balance ratis leader distribution in datanodes (was: Recommend leader host to Ratis via pipeline creation) > Balance ratis leader distribution in datanodes > -- > > Key: HDDS-2922 > URL: https://issues.apache.org/jira/browse/HDDS-2922 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Li Cheng >Assignee: runzhiwang >Priority: Major > > Ozone should be able to recommend leader host to Ratis via pipeline creation. > The leader host can be recommended based on rack awareness and load balance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4176) Fix failed UT: test2WayCommitForTimeoutException
[ https://issues.apache.org/jira/browse/HDDS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4176: - Description: org.apache.ratis.protocol.GroupMismatchException: 6f2b1ee5-bc2b-491c-bff4-ab0f4ce64709: group-2D066F5AFBD0 not found. at org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:127) at org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:274) at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:283) at org.apache.hadoop.ozone.container.ContainerTestHelper.getRaftServerImpl(ContainerTestHelper.java:593) at org.apache.hadoop.ozone.container.ContainerTestHelper.isRatisFollower(ContainerTestHelper.java:608) at org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:302) > Fix failed UT: test2WayCommitForTimeoutException > > > Key: HDDS-4176 > URL: https://issues.apache.org/jira/browse/HDDS-4176 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > > org.apache.ratis.protocol.GroupMismatchException: > 6f2b1ee5-bc2b-491c-bff4-ab0f4ce64709: group-2D066F5AFBD0 not found. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:127) > at > org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:274) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:283) > at > org.apache.hadoop.ozone.container.ContainerTestHelper.getRaftServerImpl(ContainerTestHelper.java:593) > at > org.apache.hadoop.ozone.container.ContainerTestHelper.isRatisFollower(ContainerTestHelper.java:608) > at > org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:302) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4176) Fix failed UT: test2WayCommitForTimeoutException
runzhiwang created HDDS-4176: Summary: Fix failed UT: test2WayCommitForTimeoutException Key: HDDS-4176 URL: https://issues.apache.org/jira/browse/HDDS-4176 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode
[ https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908 ] runzhiwang edited comment on HDDS-3630 at 8/31/20, 12:54 AM: - In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. The off-heap memory can not be gc when rocksdb instance is alive. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Besides, open rocksdb cost about 300ms, the capacity of 1500 cached container is only 7.5TB, if there is 750TB data in one datanode, cache miss will happen frequently, cache miss cause open new rocksdb, so too many rocksdb instances also decrease performance. Make a conclusion, too many rocksdb instances cause: 1. 9.1GB off-heap memory when write 7.5TB data . 2. 3000 threads when write 7.5TB data. 3. decrease performance when write 750TB data and cache miss frequently. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. was (Author: yjxxtd): In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. The off-heap memory can not be gc when rocksdb instance is alive. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Besides, open rocksdb cost about 300ms, the capacity of 1500 cached container is only 7.5TB, if there is 750TB data in one datanode, cache miss will happen frequently, cache miss cause open new rocksdb, so too many rocksdb also decrease performance. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also
[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode
[ https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908 ] runzhiwang edited comment on HDDS-3630 at 8/31/20, 12:49 AM: - In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. The off-heap memory can not be gc when rocksdb instance is alive. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Besides, open rocksdb cost about 300ms, the capacity of 1500 cached container is only 7.5TB, if there is 750TB data in one datanode, cache miss will happen frequently, cache miss cause open new rocksdb, so too many rocksdb also decrease performance. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. was (Author: yjxxtd): In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. The off-heap memory can not be gc when rocksdb instance is alive. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like
[jira] [Updated] (HDDS-4138) Improve crc efficiency
[ https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4138: - Description: HADOOP has implemented several method to calculate crc: https://issues.apache.org/jira/browse/HADOOP-15033 We should choose the method with high efficiency. This flame graph is from [~elek] !screenshot-1.png! was: HADOOP has implemented several method to calculate crc: https://issues.apache.org/jira/browse/HADOOP-15033 We should choose the method with high efficiency. > Improve crc efficiency > -- > > Key: HDDS-4138 > URL: https://issues.apache.org/jira/browse/HDDS-4138 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > HADOOP has implemented several method to calculate crc: > https://issues.apache.org/jira/browse/HADOOP-15033 > We should choose the method with high efficiency. > This flame graph is from [~elek] > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4138) Improve crc efficiency
[ https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4138: - Attachment: screenshot-1.png > Improve crc efficiency > -- > > Key: HDDS-4138 > URL: https://issues.apache.org/jira/browse/HDDS-4138 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > HADOOP has implemented several method to calculate crc: > https://issues.apache.org/jira/browse/HADOOP-15033 > We should choose the method with high efficiency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4138) Improve crc efficiency
[ https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4138: - Description: HADOOP has implemented several method to calculate crc: https://issues.apache.org/jira/browse/HADOOP-15033 We should choose the method with high efficiency. was: HADOOP has implemented several method to calculate crc: https://issues.apache.org/jira/browse/HADOOP-15033 We should choose the method with high efficiency. > Improve crc efficiency > -- > > Key: HDDS-4138 > URL: https://issues.apache.org/jira/browse/HDDS-4138 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > > HADOOP has implemented several method to calculate crc: > https://issues.apache.org/jira/browse/HADOOP-15033 > We should choose the method with high efficiency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4138) Improve crc efficiency
[ https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-4138: - Description: HADOOP has implemented several method to calculate crc: https://issues.apache.org/jira/browse/HADOOP-15033 We should choose the method with high efficiency. > Improve crc efficiency > -- > > Key: HDDS-4138 > URL: https://issues.apache.org/jira/browse/HDDS-4138 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > > HADOOP has implemented several method to calculate crc: > https://issues.apache.org/jira/browse/HADOOP-15033 > We should choose the method with high efficiency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4138) Improve crc efficiency
runzhiwang created HDDS-4138: Summary: Improve crc efficiency Key: HDDS-4138 URL: https://issues.apache.org/jira/browse/HDDS-4138 Project: Hadoop Distributed Data Store Issue Type: Task Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode
[ https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908 ] runzhiwang edited comment on HDDS-3630 at 8/24/20, 3:00 AM: In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. The off-heap memory can not be gc when rocksdb instance is alive. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. was (Author: yjxxtd): In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. > Merge rocksdb in datanode > - > > Key: HDDS-3630 > URL: https://issues.apache.org/jira/browse/HDDS-3630 > Project: Hadoop Distributed Data Store > Issue Type:
[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode
[ https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908 ] runzhiwang edited comment on HDDS-3630 at 8/24/20, 2:56 AM: In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Except merging rocksdb, there are 2 other options which both have cons, so I prefer merging rocksdb. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. was (Author: yjxxtd): In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Except merging rocksdb, there are 2 other options which both have cons. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. > Merge rocksdb in datanode > - > > Key: HDDS-3630 > URL: https://issues.apache.org/jira/browse/HDDS-3630 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority:
[jira] [Commented] (HDDS-3630) Merge rocksdb in datanode
[ https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908 ] runzhiwang commented on HDDS-3630: -- In my test, the capacity of the container is 5GB, ozone.container.cache.size is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB instances in memory when one datanode writes 7.5TB data. The basic settings of memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m. After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block size is smaller, the off-heap memory is bigger. There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. So the off-heap memory of rocksdb consists of: 1. block cache of rocksdb. 2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads. Except merging rocksdb, there are 2 other options which both have cons. 1. flush the rocksdb memory of closed container. cons: thousands of rocksdb threads still exist. 2. remove rocksdb in datanode, and store the data in file. cons: a. It really needs a big work to remove rocksdb b. If we store all the checksum in file, in order to query checksum faster, we must load the file into memory, because the file is for each container, when the number of closed container increase, the file number also increase, we can not load all the file of all closed container into memory, otherwise OOM will happen. we must maintain an elimination strategy for the checksum file in memory such as LRU. It looks like we do the work which can be done by rocksdb. c. For open container, there is also some data needed to store in file and update frequently, we can not force sync to disk every time when update happens, because some data is not important such as block count in each container. So we must create a background thread to do a batch sync for these types of data, it looks complicated. Everytime when we add these type of data, we must do the duplicated work, the code may be hard to maintain. d. Reduce memory can be achieved by merging rocksdb, it looks like more easier than removing it, so maybe we need not spend so much time removing it. > Merge rocksdb in datanode > - > > Key: HDDS-3630 > URL: https://issues.apache.org/jira/browse/HDDS-3630 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: Merge RocksDB in Datanode-v1.pdf, Merge RocksDB in > Datanode-v2.pdf > > > Currently, one rocksdb for one container. one container has 5GB capacity. > 10TB data need more than 2000 rocksdb in one datanode. It's difficult to > limit the memory of 2000 rocksdb. So maybe we should limited instance of > rocksdb for each disk. > The design of improvement is in the follow link, but still is a draft. > TODO: > 1. compatibility with current logic i.e. one rocksdb for each container > 2. measure the memory usage before and after improvement > 3. effect on efficiency of read and write. > https://docs.google.com/document/d/18Ybg-NjyU602c-MYXaJHP6yrg-dVMZKGyoK5C_pp1mM/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4024) Avoid while loop too soon when exception happen
runzhiwang created HDDS-4024: Summary: Avoid while loop too soon when exception happen Key: HDDS-4024 URL: https://issues.apache.org/jira/browse/HDDS-4024 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock
[ https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3978: - Description: We met dead lock related to log4j, the jstack information has been attached. For the following two reasons, I want to switch log4j in ozone and ratis to log4j2. 1. There are a lot of dead lock report in log4j: https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/ 2. And log4j2 is better than log4j. https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j Besides log4j and log4j2 both exist in ozone, audit log use log4j2, and other log use log4j, maybe it's time to unify them. was: We met dead lock related to log4j, the jstack information has been attached. For the following two reasons, I want to switch log4j in ozone and ratis to log4j2. 1. There are a lot of dead lock report in log4j: https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/ 2. And log4j2 is better than log4j. https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j > Switch log4j to log4j2 to avoid deadlock > > > Key: HDDS-3978 > URL: https://issues.apache.org/jira/browse/HDDS-3978 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack-deadlock-log.txt > > > We met dead lock related to log4j, the jstack information has been attached. > For the following two reasons, I want to switch log4j in ozone and ratis to > log4j2. > 1. There are a lot of dead lock report in log4j: > https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/ > 2. And log4j2 is better than log4j. > https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j > Besides log4j and log4j2 both exist in ozone, audit log use log4j2, and other > log use log4j, maybe it's time to unify them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock
[ https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3978: - Description: We met dead lock related to log4j, the jstack information has been attached. For the following two reasons, I want to switch log4j in ozone and ratis to log4j2. 1. There are a lot of dead lock report in log4j: https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/ 2. And log4j2 is better than log4j. https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j was:We met dead lock related to log4j, the jstack information has been attached. > Switch log4j to log4j2 to avoid deadlock > > > Key: HDDS-3978 > URL: https://issues.apache.org/jira/browse/HDDS-3978 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack-deadlock-log.txt > > > We met dead lock related to log4j, the jstack information has been attached. > For the following two reasons, I want to switch log4j in ozone and ratis to > log4j2. > 1. There are a lot of dead lock report in log4j: > https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/ > 2. And log4j2 is better than log4j. > https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock
[ https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3978: - Description: We met dead lock related to log4j, the jstack information has been attached. (was: We met dead lock related to log4j, the jstack information has been attached.) > Switch log4j to log4j2 to avoid deadlock > > > Key: HDDS-3978 > URL: https://issues.apache.org/jira/browse/HDDS-3978 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack-deadlock-log.txt > > > We met dead lock related to log4j, the jstack information has been attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock
[ https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3978: - Description: We met dead lock related to log4j, the jstack information has been attached. > Switch log4j to log4j2 to avoid deadlock > > > Key: HDDS-3978 > URL: https://issues.apache.org/jira/browse/HDDS-3978 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack-deadlock-log.txt > > > We met dead lock related to log4j, the jstack information has been attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock
[ https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3978: - Attachment: jstack-deadlock-log.txt > Switch log4j to log4j2 to avoid deadlock > > > Key: HDDS-3978 > URL: https://issues.apache.org/jira/browse/HDDS-3978 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack-deadlock-log.txt > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock
runzhiwang created HDDS-3978: Summary: Switch log4j to log4j2 to avoid deadlock Key: HDDS-3978 URL: https://issues.apache.org/jira/browse/HDDS-3978 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3952) Merge small container
[ https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3952: - Summary: Merge small container (was: Make container could be reopened) > Merge small container > - > > Key: HDDS-3952 > URL: https://issues.apache.org/jira/browse/HDDS-3952 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3952) Make container could be reopened
[ https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636 ] runzhiwang edited comment on HDDS-3952 at 7/17/20, 3:02 AM: [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small containers which are not full but closed, as the image shows, because ratis pipeline is not stable. So we want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. Then put the containers of map.get(pipeline.datanodes()) on the pipeline and reopen them. [~msingh] What do you think ? !screenshot-1.png! was (Author: yjxxtd): [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small containers which are not full but closed, as the image shows, because ratis pipeline is not stable. So we want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. [~msingh] What do you think ? !screenshot-1.png! > Make container could be reopened > > > Key: HDDS-3952 > URL: https://issues.apache.org/jira/browse/HDDS-3952 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3952) Make container could be reopened
[ https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636 ] runzhiwang edited comment on HDDS-3952 at 7/17/20, 3:01 AM: [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small containers which are not full but closed, as the image shows, because ratis pipeline is not stable. So we want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. [~msingh] What do you think ? !screenshot-1.png! was (Author: yjxxtd): [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small containers which are not full but closed, as the image shows, because ratis pipeline is not stable. So We want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. [~msingh] What do you think ? !screenshot-1.png! > Make container could be reopened > > > Key: HDDS-3952 > URL: https://issues.apache.org/jira/browse/HDDS-3952 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3952) Make container could be reopened
[ https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636 ] runzhiwang edited comment on HDDS-3952 at 7/17/20, 3:00 AM: [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small containers which are not full but closed, as the image shows, because ratis pipeline is not stable. So We want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. [~msingh] What do you think ? !screenshot-1.png! was (Author: yjxxtd): [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small container which is not full but closed, as the image shows, because ratis pipeline is not stable. So We want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. [~msingh] What do you think ? !screenshot-1.png! > Make container could be reopened > > > Key: HDDS-3952 > URL: https://issues.apache.org/jira/browse/HDDS-3952 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3952) Make container could be reopened
[ https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636 ] runzhiwang commented on HDDS-3952: -- [~msingh] Hi, Thanks for review. The backgroud of this jira is our production cluster has about 7000 small container which is not full but closed, as the image shows, because ratis pipeline is not stable. So We want to reopen and write to the closed but not full container. The basic idea: 1. SCM build a map with entry <3 datanodes, set of closed but not full containers on the 3 datanodes> 2. SCM check whether any open pipeline locate on the 3 datanodes from the map.entrySet, if exists such open pipeline, we get the closed but not full containers by map.get(pipeline.datanodes()), put them on the pipeline and reopen them. 3. When SCM create new pipeline, we first select from the map which 3 datanodes has the most closed but not full containers, and create pipeline on this 3 datanodes. [~msingh] What do you think ? !screenshot-1.png! > Make container could be reopened > > > Key: HDDS-3952 > URL: https://issues.apache.org/jira/browse/HDDS-3952 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3952) Make container could be reopened
[ https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3952: - Attachment: screenshot-1.png > Make container could be reopened > > > Key: HDDS-3952 > URL: https://issues.apache.org/jira/browse/HDDS-3952 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode, SCM >Affects Versions: 0.7.0 >Reporter: maobaolong >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3941) Enable core dump when crash in C++
[ https://issues.apache.org/jira/browse/HDDS-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang resolved HDDS-3941. -- Fix Version/s: 0.6.0 Resolution: Fixed > Enable core dump when crash in C++ > -- > > Key: HDDS-3941 > URL: https://issues.apache.org/jira/browse/HDDS-3941 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3957) Fix mixed use of Longs.toByteArray and Ints.fromByteArray
[ https://issues.apache.org/jira/browse/HDDS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3957: - Summary: Fix mixed use of Longs.toByteArray and Ints.fromByteArray (was: Fix error use Longs.toByteArray and Ints.fromByteArray of DB_PENDING_DELETE_BLOCK_COUNT_KEY) > Fix mixed use of Longs.toByteArray and Ints.fromByteArray > - > > Key: HDDS-3957 > URL: https://issues.apache.org/jira/browse/HDDS-3957 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3957) Fix error use Longs.toByteArray and Ints.fromByteArray of DB_PENDING_DELETE_BLOCK_COUNT_KEY
runzhiwang created HDDS-3957: Summary: Fix error use Longs.toByteArray and Ints.fromByteArray of DB_PENDING_DELETE_BLOCK_COUNT_KEY Key: HDDS-3957 URL: https://issues.apache.org/jira/browse/HDDS-3957 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Fix memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Summary: Fix memory leak because of too many Datanode State Machine Thread (was: memory leak because of too many Datanode State Machine Thread) > Fix memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack.txt, screenshot-1.png, screenshot-2.png, > screenshot-3.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3223) Improve s3g read 1GB object efficiency by 100 times
[ https://issues.apache.org/jira/browse/HDDS-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3223: - Summary: Improve s3g read 1GB object efficiency by 100 times (was: Improve s3g read 1GB object efficiency by 10 times ) > Improve s3g read 1GB object efficiency by 100 times > > > Key: HDDS-3223 > URL: https://issues.apache.org/jira/browse/HDDS-3223 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Critical > Labels: pull-request-available > Fix For: 0.6.0 > > Attachments: screenshot-1.png > > > *What's the problem ?* > Read 1000M object, it cost about 470 seconds, i.e. 2.2M/s, which is too slow. > *What's the reason ?* > When read 1000M file, there are 50 GET requests, each GET request read 20M. > When do GET, the stack is: > [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262] > -> > [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190] > -> > [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064] > -> > [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]. > It means, the 50th GET request which should read 980M-1000M, but to skip > 0-980M, it also > [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957] > 0-980M. So the 1st GET request read 0-20M, the 2nd GET request read 0-40M, > the 3rd GET request read 0-60M, ..., the 50th GET request read 0-1000M. So > the GET request from 1st-50th become slower and slower. > You can also refer it [here|https://issues.apache.org/jira/browse/IO-203] why > IOUtils implement skip by read rather than real skip, e.g. seek. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3941) Enable core dump when crash in C++
[ https://issues.apache.org/jira/browse/HDDS-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3941: - Summary: Enable core dump when crash in C++ (was: Set core file size to debug when crash in C++) > Enable core dump when crash in C++ > -- > > Key: HDDS-3941 > URL: https://issues.apache.org/jira/browse/HDDS-3941 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Summary: memory leak because of too many Datanode State Machine Thread (was: Memory leak because of too many Datanode State Machine Thread) > memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack.txt, screenshot-1.png, screenshot-2.png, > screenshot-3.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Attachment: jstack.txt > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: jstack.txt, screenshot-1.png, screenshot-2.png, > screenshot-3.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Description: When create 22345th Datanode State Machine Thread, OOM happened. !screenshot-1.png! !screenshot-2.png! !screenshot-3.png! was: When create 22345th Datanode State Machine Thread, OOM happened. !screenshot-1.png! !screenshot-2.png! > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Attachment: screenshot-3.png > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist
[ https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang reassigned HDDS-3853: Assignee: runzhiwang (was: Shashikant Banerjee) > Container marked as missing on datanode while container directory do exist > -- > > Key: HDDS-3853 > URL: https://issues.apache.org/jira/browse/HDDS-3853 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: runzhiwang >Priority: Major > > {code} > INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: > PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , > Message: ContainerID 1744 has been lost and and cannot be recreated on this > DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred. > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > ContainerID 1744 has been lost and and cannot be recreated on this DataNode > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine: > gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex > 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on > this DataNode Container Result: CONTAINER_MISSING > ERROR > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: > pipeline Action CLOSE on pipeline > PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction > failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER > .Triggering pipeline close action > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Description: When create 22345th Datanode State Machine Thread, OOM happened. !screenshot-1.png! !screenshot-2.png! was: When create 22345th Datanode State Machine Thread, OOM happened. !screenshot-1.png! > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Attachment: screenshot-2.png > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Description: When create 22345th Datanode State Machine Thread, OOM happened. !screenshot-1.png! > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > When create 22345th Datanode State Machine Thread, OOM happened. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
[ https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3933: - Attachment: screenshot-1.png > Memory leak because of too many Datanode State Machine Thread > - > > Key: HDDS-3933 > URL: https://issues.apache.org/jira/browse/HDDS-3933 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread
runzhiwang created HDDS-3933: Summary: Memory leak because of too many Datanode State Machine Thread Key: HDDS-3933 URL: https://issues.apache.org/jira/browse/HDDS-3933 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2922) Recommend leader host to Ratis via pipeline creation
[ https://issues.apache.org/jira/browse/HDDS-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang reassigned HDDS-2922: Assignee: runzhiwang (was: Li Cheng) > Recommend leader host to Ratis via pipeline creation > > > Key: HDDS-2922 > URL: https://issues.apache.org/jira/browse/HDDS-2922 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Li Cheng >Assignee: runzhiwang >Priority: Major > > Ozone should be able to recommend leader host to Ratis via pipeline creation. > The leader host can be recommended based on rack awareness and load balance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2922) Recommend leader host to Ratis via pipeline creation
[ https://issues.apache.org/jira/browse/HDDS-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150886#comment-17150886 ] runzhiwang commented on HDDS-2922: -- I'm working on it > Recommend leader host to Ratis via pipeline creation > > > Key: HDDS-2922 > URL: https://issues.apache.org/jira/browse/HDDS-2922 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Li Cheng >Assignee: runzhiwang >Priority: Major > > Ozone should be able to recommend leader host to Ratis via pipeline creation. > The leader host can be recommended based on rack awareness and load balance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3899) Avoid change state from closing to exception in LogAppender
[ https://issues.apache.org/jira/browse/HDDS-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang resolved HDDS-3899. -- Resolution: Invalid > Avoid change state from closing to exception in LogAppender > --- > > Key: HDDS-3899 > URL: https://issues.apache.org/jira/browse/HDDS-3899 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3899) Avoid change state from closing to exception in LogAppender
runzhiwang created HDDS-3899: Summary: Avoid change state from closing to exception in LogAppender Key: HDDS-3899 URL: https://issues.apache.org/jira/browse/HDDS-3899 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: runzhiwang Assignee: runzhiwang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3861) Fix handlePipelineFailure throw exception if role is follower
[ https://issues.apache.org/jira/browse/HDDS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] runzhiwang updated HDDS-3861: - Description: !screenshot-1.png! > Fix handlePipelineFailure throw exception if role is follower > - > > Key: HDDS-3861 > URL: https://issues.apache.org/jira/browse/HDDS-3861 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org