[jira] [Reopened] (HDDS-4378) Ozone shell support truncate API

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang reopened HDDS-4378:
--

> Ozone shell support truncate API
> 
>
> Key: HDDS-4378
> URL: https://issues.apache.org/jira/browse/HDDS-4378
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4378) Ozone shell support truncate API

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang resolved HDDS-4378.
--
  Assignee: runzhiwang
Resolution: Fixed

> Ozone shell support truncate API
> 
>
> Key: HDDS-4378
> URL: https://issues.apache.org/jira/browse/HDDS-4378
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4382) SCM send truncate block to datanode

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4382:
-
Description: SCM starts a background thread to send 
TruncateBlocksCommandProto to the datanode.

> SCM send truncate block to datanode
> ---
>
> Key: HDDS-4382
> URL: https://issues.apache.org/jira/browse/HDDS-4382
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> SCM starts a background thread to send TruncateBlocksCommandProto to the 
> datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4379) OM marks truncated blocks

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4379:
-
Description: Append mean ozone client can append some content to the tail 
of some key. So if Client1 truncates the file but datanode has not truncated 
blockN -> Client2 appends some content to the tail of blockN -> Datanode 
truncates blockN, then error happens. To avoid this, OmKeyLocationInfo add a 
new flag, i.e. toBeTruncated, to mark the block need to be truncated in the 
future. When Client2 appends, Client2 finds blockN with toBeTruncated flag, 
Client2 will allocate a new block, and append the content to the new block.

> OM marks truncated blocks
> -
>
> Key: HDDS-4379
> URL: https://issues.apache.org/jira/browse/HDDS-4379
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> Append mean ozone client can append some content to the tail of some key. So 
> if Client1 truncates the file but datanode has not truncated blockN -> 
> Client2 appends some content to the tail of blockN -> Datanode truncates 
> blockN, then error happens. To avoid this, OmKeyLocationInfo add a new flag, 
> i.e. toBeTruncated, to mark the block need to be truncated in the future. 
> When Client2 appends, Client2 finds blockN with toBeTruncated flag, Client2 
> will allocate a new block, and append the content to the new block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4383) Datanode truncates blocks on the disk

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4383:
-
Description: Datanode starts a background thread to process 
. If it is FilePerBlock, we use 
FileChannel.truncate to truncate file to newLength directly. If it is 
FilePerChunk, we delete the files of the fully truncated chunks, and use 
FileChannel.truncate to process the partially truncated file. Then, in RocksDB, 
Datanode delete , and put .

> Datanode truncates blocks on the disk
> -
>
> Key: HDDS-4383
> URL: https://issues.apache.org/jira/browse/HDDS-4383
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> Datanode starts a background thread to process  newLength>. If it is FilePerBlock, we use FileChannel.truncate to truncate 
> file to newLength directly. If it is FilePerChunk, we delete the files of the 
> fully truncated chunks, and use FileChannel.truncate to process the partially 
> truncated file. Then, in RocksDB, Datanode delete  newLength>, and put .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4379) OM marks truncated blocks

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4379:
-
Summary: OM marks truncated blocks  (was: Mark truncated blocks in OM)

> OM marks truncated blocks
> -
>
> Key: HDDS-4379
> URL: https://issues.apache.org/jira/browse/HDDS-4379
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4383) Datanode truncates blocks on the disk

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4383:


 Summary: Datanode truncates blocks on the disk
 Key: HDDS-4383
 URL: https://issues.apache.org/jira/browse/HDDS-4383
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4377) Datanode truncates blocks on the disk

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4377:
-
Description: When Datanode receives TruncateBlocksCommand, in the RocksDB, 
Datanode deletes the chunks which are fully truncated, and updates the chunk 
length and checksum which is partially truncated. Datanode puts 
 in RocksDB, and returns succ to SCM.

> Datanode truncates blocks on the disk
> -
>
> Key: HDDS-4377
> URL: https://issues.apache.org/jira/browse/HDDS-4377
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> When Datanode receives TruncateBlocksCommand, in the RocksDB, Datanode 
> deletes the chunks which are fully truncated, and updates the chunk length 
> and checksum which is partially truncated. Datanode puts  newLength> in RocksDB, and returns succ to SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4377) Datanode changes the block length in rocksdb

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4377:
-
Summary: Datanode changes the block length in rocksdb  (was: Datanode 
truncates blocks on the disk)

> Datanode changes the block length in rocksdb
> 
>
> Key: HDDS-4377
> URL: https://issues.apache.org/jira/browse/HDDS-4377
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> When Datanode receives TruncateBlocksCommand, in the RocksDB, Datanode 
> deletes the chunks which are fully truncated, and updates the chunk length 
> and checksum which is partially truncated. Datanode puts  newLength> in RocksDB, and returns succ to SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4382) SCM send truncate block to datanode

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4382:


 Summary: SCM send truncate block to datanode
 Key: HDDS-4382
 URL: https://issues.apache.org/jira/browse/HDDS-4382
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4376) SCM create transaction for truncated blocks

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4376:
-
Description: 
When scm receives TruncateScmKeyRequestProto, for the deleteBlocks, scm has 
already implemented the code.

For the partialTruncateBlocks, we process it as a transaction like 
deleteBlocks, store  in 
truncatedBlocksTable, and return succ to OM.

We abstract the code related to delete block transaction, so that truncate and 
delete blocks can share the abstract code.

> SCM create transaction for truncated blocks
> ---
>
> Key: HDDS-4376
> URL: https://issues.apache.org/jira/browse/HDDS-4376
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>
> When scm receives TruncateScmKeyRequestProto, for the deleteBlocks, scm has 
> already implemented the code.
> For the partialTruncateBlocks, we process it as a transaction like 
> deleteBlocks, store  in 
> truncatedBlocksTable, and return succ to OM.
> We abstract the code related to delete block transaction, so that truncate 
> and delete blocks can share the abstract code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4381) OM send truncate key to SCM

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4381:
-
Description: OM starts a background thread to select a certain number of 
entries from truncateTable, and sends these entries to SCM by the 
TruncateScmKeyRequestProto.

> OM send truncate key to SCM
> ---
>
> Key: HDDS-4381
> URL: https://issues.apache.org/jira/browse/HDDS-4381
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> OM starts a background thread to select a certain number of entries from 
> truncateTable, and sends these entries to SCM by the 
> TruncateScmKeyRequestProto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4381) OM send truncate key to SCM

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4381:


 Summary: OM send truncate key to SCM
 Key: HDDS-4381
 URL: https://issues.apache.org/jira/browse/HDDS-4381
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4380) OM stores truncate key in truncateTable

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4380:


 Summary: OM stores truncate key in truncateTable
 Key: HDDS-4380
 URL: https://issues.apache.org/jira/browse/HDDS-4380
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4380) OM stores truncate key in truncateTable

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4380:
-
Description: OM stores  in truncateTable. 
The TruncateKey in RepeatedTruncateOmKeyInfo represents one truncate key 
operation, so the list of TruncateKey allows us to store a list of truncate 
operations related to one key.

> OM stores truncate key in truncateTable
> ---
>
> Key: HDDS-4380
> URL: https://issues.apache.org/jira/browse/HDDS-4380
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Priority: Major
>
> OM stores  in truncateTable. The TruncateKey 
> in RepeatedTruncateOmKeyInfo represents one truncate key operation, so the 
> list of TruncateKey allows us to store a list of truncate operations related 
> to one key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4375) OM changes the block length when receives truncate request

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4375:
-
Description: When OM receives truncate(key, newLength),in the keyTable, OM 
deletes the blocks which are fully truncated, and updates the block length 
which is partially truncated, then return success to client.

> OM changes the block length when receives truncate request
> --
>
> Key: HDDS-4375
> URL: https://issues.apache.org/jira/browse/HDDS-4375
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: pull-request-available
>
> When OM receives truncate(key, newLength),in the keyTable, OM deletes the 
> blocks which are fully truncated, and updates the block length which is 
> partially truncated, then return success to client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4375) OM changes the block length when receives truncate request

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4375:
-
Summary: OM changes the block length when receives truncate request  (was: 
OM changes the block length when receive truncate request)

> OM changes the block length when receives truncate request
> --
>
> Key: HDDS-4375
> URL: https://issues.apache.org/jira/browse/HDDS-4375
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4375) OM changes the block length when receive truncate request

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4375:
-
Summary: OM changes the block length when receive truncate request  (was: 
Client asks OM to change the block length when truncate)

> OM changes the block length when receive truncate request
> -
>
> Key: HDDS-4375
> URL: https://issues.apache.org/jira/browse/HDDS-4375
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4379) Mark truncated blocks in OM

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4379:


 Summary: Mark truncated blocks in OM
 Key: HDDS-4379
 URL: https://issues.apache.org/jira/browse/HDDS-4379
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4378) Ozone shell support truncate API

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4378:


 Summary: Ozone shell support truncate API
 Key: HDDS-4378
 URL: https://issues.apache.org/jira/browse/HDDS-4378
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4376) SCM create transaction for truncated blocks

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4376:


 Summary: SCM create transaction for truncated blocks
 Key: HDDS-4376
 URL: https://issues.apache.org/jira/browse/HDDS-4376
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4377) Datanode truncates blocks on the disk

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4377:


 Summary: Datanode truncates blocks on the disk
 Key: HDDS-4377
 URL: https://issues.apache.org/jira/browse/HDDS-4377
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4375) Client asks OM to change the block length when truncate

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4375:
-
Parent: HDDS-4239
Issue Type: Sub-task  (was: New Feature)

> Client asks OM to change the block length when truncate
> ---
>
> Key: HDDS-4375
> URL: https://issues.apache.org/jira/browse/HDDS-4375
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4375) Client asks OM to change the block length when truncate

2020-10-22 Thread runzhiwang (Jira)
runzhiwang created HDDS-4375:


 Summary: Client asks OM to change the block length when truncate
 Key: HDDS-4375
 URL: https://issues.apache.org/jira/browse/HDDS-4375
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-10-22 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Parent: (was: HDDS-3714)
Issue Type: New Feature  (was: Sub-task)

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-21 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: (was: Ozone Truncate Design-v3.pdf)

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-21 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: Ozone Truncate Design.pdf

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-21 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: (was: Ozone Truncate Design-v2.pdf)

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-21 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: (was: Ozone Truncate Design-v1.pdf)

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design-v2.pdf, Ozone Truncate 
> Design-v3.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-18 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: Ozone Truncate Design-v3.pdf

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design-v1.pdf, Ozone Truncate 
> Design-v2.pdf, Ozone Truncate Design-v3.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-17 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: Ozone Truncate Design-v2.pdf

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design-v1.pdf, Ozone Truncate 
> Design-v2.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4240) Ozone support append operation

2020-09-13 Thread runzhiwang (Jira)
runzhiwang created HDDS-4240:


 Summary: Ozone support append operation
 Key: HDDS-4240
 URL: https://issues.apache.org/jira/browse/HDDS-4240
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate operation

2020-09-13 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Summary: Ozone support truncate operation  (was: Ozone support truncate)

> Ozone support truncate operation
> 
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design-v1.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate

2020-09-13 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Attachment: Ozone Truncate Design-v1.pdf

> Ozone support truncate
> --
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Ozone Truncate Design-v1.pdf
>
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate

2020-09-13 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Description: Design: 
https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#

> Ozone support truncate
> --
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>
> Design: 
> https://docs.google.com/document/d/1Ju9WeuFuf_D8gElRCJH1-as0OyC6TOtHPHErycL43XQ/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4239) Ozone support truncate

2020-09-13 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4239:
-
Parent: HDDS-3714
Issue Type: Sub-task  (was: New Feature)

> Ozone support truncate
> --
>
> Key: HDDS-4239
> URL: https://issues.apache.org/jira/browse/HDDS-4239
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4239) Ozone support truncate

2020-09-13 Thread runzhiwang (Jira)
runzhiwang created HDDS-4239:


 Summary: Ozone support truncate
 Key: HDDS-4239
 URL: https://issues.apache.org/jira/browse/HDDS-4239
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3714) Ozone support append truncate operation

2020-09-08 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang reassigned HDDS-3714:


Assignee: runzhiwang

> Ozone support append truncate operation
> ---
>
> Key: HDDS-3714
> URL: https://issues.apache.org/jira/browse/HDDS-3714
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Manager
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4214) Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure

2020-09-06 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang resolved HDDS-4214.
--
Resolution: Duplicate

> Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure
> 
>
> Key: HDDS-4214
> URL: https://issues.apache.org/jira/browse/HDDS-4214
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4214) Fix failed UT: TestContainerStateMachineFailures#testApplyTransactionFailure

2020-09-06 Thread runzhiwang (Jira)
runzhiwang created HDDS-4214:


 Summary: Fix failed UT: 
TestContainerStateMachineFailures#testApplyTransactionFailure
 Key: HDDS-4214
 URL: https://issues.apache.org/jira/browse/HDDS-4214
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4202) Upgrade ratis to 1.1.0-ea949f1-SNAPSHOT

2020-09-02 Thread runzhiwang (Jira)
runzhiwang created HDDS-4202:


 Summary: Upgrade ratis to 1.1.0-ea949f1-SNAPSHOT
 Key: HDDS-4202
 URL: https://issues.apache.org/jira/browse/HDDS-4202
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4201) Improve the performance of OmKeyLocationInfoGroup

2020-09-02 Thread runzhiwang (Jira)
runzhiwang created HDDS-4201:


 Summary: Improve the performance of OmKeyLocationInfoGroup
 Key: HDDS-4201
 URL: https://issues.apache.org/jira/browse/HDDS-4201
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache

2020-09-02 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4199:
-
Attachment: screenshot-1.png

> Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
> 
>
> Key: HDDS-4199
> URL: https://issues.apache.org/jira/browse/HDDS-4199
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache

2020-09-02 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4199:
-
Description:  !screenshot-1.png! 

> Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
> 
>
> Key: HDDS-4199
> URL: https://issues.apache.org/jira/browse/HDDS-4199
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache

2020-09-02 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang reassigned HDDS-4199:


Assignee: runzhiwang

> Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache
> 
>
> Key: HDDS-4199
> URL: https://issues.apache.org/jira/browse/HDDS-4199
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4199) Fix failed UT: TestOMAllocateBlockRequest#testValidateAndUpdateCache

2020-09-02 Thread runzhiwang (Jira)
runzhiwang created HDDS-4199:


 Summary: Fix failed UT: 
TestOMAllocateBlockRequest#testValidateAndUpdateCache
 Key: HDDS-4199
 URL: https://issues.apache.org/jira/browse/HDDS-4199
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4187) Fix recon OOM

2020-09-02 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4187:
-
Summary: Fix recon OOM  (was: Fix memory leak of recon)

> Fix recon OOM
> -
>
> Key: HDDS-4187
> URL: https://issues.apache.org/jira/browse/HDDS-4187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> 40 datanodes with 400, 000 containers, start recon with xmx:10G. After 
> several hours, recon's memory increase to 12G and OOM. Memory leak happens on 
> heap, and the reason is recon is slow to process ContainerReport, so the 
> queue of thread OOM.
>   !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4187) Fix memory leak of recon

2020-09-01 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4187:
-
Description: 
40 datanodes with 400, 000 containers, start recon with xmx:10G. After several 
hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and 
the reason is recon is slow to process ContainerReport, so the queue of thread 
OOM.

  !screenshot-1.png! 
 !screenshot-2.png! 

  was:
40 datanodes with 400, 000 containers, start recon with xmx:10G. After several 
hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and 
the reason is recon is slow to process ContainerReplicaReport, so the queue of 
thread OOM.

  !screenshot-1.png! 
 !screenshot-2.png! 


> Fix memory leak of recon
> 
>
> Key: HDDS-4187
> URL: https://issues.apache.org/jira/browse/HDDS-4187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> 40 datanodes with 400, 000 containers, start recon with xmx:10G. After 
> several hours, recon's memory increase to 12G and OOM. Memory leak happens on 
> heap, and the reason is recon is slow to process ContainerReport, so the 
> queue of thread OOM.
>   !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4187) Fix memory leak of recon

2020-09-01 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4187:
-
Description: 
40 datanodes with 400, 000 containers, start recon with xmx:10G. After several 
hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and 
the reason is recon is slow to process ContainerReplicaReport, so the queue of 
thread OOM.

  !screenshot-1.png! 
 !screenshot-2.png! 

  was:
40 datanodes with 400, 000 containers, start recon with xmx:10G. After several 
hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and 
the reason is recon is slow to process ContainerReplicaReport, so the queue of 
thread OOM.

 


> Fix memory leak of recon
> 
>
> Key: HDDS-4187
> URL: https://issues.apache.org/jira/browse/HDDS-4187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> 40 datanodes with 400, 000 containers, start recon with xmx:10G. After 
> several hours, recon's memory increase to 12G and OOM. Memory leak happens on 
> heap, and the reason is recon is slow to process ContainerReplicaReport, so 
> the queue of thread OOM.
>   !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4187) Fix memory leak of recon

2020-09-01 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4187:
-
Attachment: screenshot-2.png

> Fix memory leak of recon
> 
>
> Key: HDDS-4187
> URL: https://issues.apache.org/jira/browse/HDDS-4187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> 40 datanodes with 400, 000 containers, start recon with xmx:10G. After 
> several hours, recon's memory increase to 12G and OOM. Memory leak happens on 
> heap, and the reason is recon is slow to process ContainerReplicaReport, so 
> the queue of thread OOM.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4187) Fix memory leak of recon

2020-09-01 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4187:
-
Attachment: screenshot-1.png

> Fix memory leak of recon
> 
>
> Key: HDDS-4187
> URL: https://issues.apache.org/jira/browse/HDDS-4187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> 40 datanodes with 400, 000 containers, start recon with xmx:10G. After 
> several hours, recon's memory increase to 12G and OOM. Memory leak happens on 
> heap, and the reason is recon is slow to process ContainerReplicaReport, so 
> the queue of thread OOM.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4187) Fix memory leak of recon

2020-09-01 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4187:
-
Description: 
40 datanodes with 400, 000 containers, start recon with xmx:10G. After several 
hours, recon's memory increase to 12G and OOM. Memory leak happens on heap, and 
the reason is recon is slow to process ContainerReplicaReport, so the queue of 
thread OOM.

 

> Fix memory leak of recon
> 
>
> Key: HDDS-4187
> URL: https://issues.apache.org/jira/browse/HDDS-4187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>
> 40 datanodes with 400, 000 containers, start recon with xmx:10G. After 
> several hours, recon's memory increase to 12G and OOM. Memory leak happens on 
> heap, and the reason is recon is slow to process ContainerReplicaReport, so 
> the queue of thread OOM.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4187) Fix memory leak of recon

2020-09-01 Thread runzhiwang (Jira)
runzhiwang created HDDS-4187:


 Summary: Fix memory leak of recon
 Key: HDDS-4187
 URL: https://issues.apache.org/jira/browse/HDDS-4187
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2922) Balance ratis leader distribution in datanodes

2020-08-31 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-2922:
-
Summary: Balance ratis leader distribution in datanodes  (was: Recommend 
leader host to Ratis via pipeline creation)

> Balance ratis leader distribution in datanodes
> --
>
> Key: HDDS-2922
> URL: https://issues.apache.org/jira/browse/HDDS-2922
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Li Cheng
>Assignee: runzhiwang
>Priority: Major
>
> Ozone should be able to recommend leader host to Ratis via pipeline creation. 
> The leader host can be recommended based on rack awareness and load balance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4176) Fix failed UT: test2WayCommitForTimeoutException

2020-08-31 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4176:
-
Description: 
org.apache.ratis.protocol.GroupMismatchException: 
6f2b1ee5-bc2b-491c-bff4-ab0f4ce64709: group-2D066F5AFBD0 not found.

at 
org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:127)
at 
org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:274)
at 
org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:283)
at 
org.apache.hadoop.ozone.container.ContainerTestHelper.getRaftServerImpl(ContainerTestHelper.java:593)
at 
org.apache.hadoop.ozone.container.ContainerTestHelper.isRatisFollower(ContainerTestHelper.java:608)
at 
org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:302)

> Fix failed UT: test2WayCommitForTimeoutException
> 
>
> Key: HDDS-4176
> URL: https://issues.apache.org/jira/browse/HDDS-4176
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>
> org.apache.ratis.protocol.GroupMismatchException: 
> 6f2b1ee5-bc2b-491c-bff4-ab0f4ce64709: group-2D066F5AFBD0 not found.
> at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:127)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:274)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:283)
> at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.getRaftServerImpl(ContainerTestHelper.java:593)
> at 
> org.apache.hadoop.ozone.container.ContainerTestHelper.isRatisFollower(ContainerTestHelper.java:608)
> at 
> org.apache.hadoop.ozone.client.rpc.TestWatchForCommit.test2WayCommitForTimeoutException(TestWatchForCommit.java:302)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4176) Fix failed UT: test2WayCommitForTimeoutException

2020-08-31 Thread runzhiwang (Jira)
runzhiwang created HDDS-4176:


 Summary: Fix failed UT: test2WayCommitForTimeoutException
 Key: HDDS-4176
 URL: https://issues.apache.org/jira/browse/HDDS-4176
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode

2020-08-30 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908
 ] 

runzhiwang edited comment on HDDS-3630 at 8/31/20, 12:54 AM:
-

In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger. The off-heap memory can not be 
gc when rocksdb instance is alive.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Besides, open rocksdb cost about 300ms, the capacity of 1500 cached container 
is only 7.5TB, if there is 750TB data in one datanode,  cache miss will happen 
frequently,  cache miss cause open new rocksdb, so too many rocksdb instances 
also decrease performance.
 
Make a conclusion, too many rocksdb instances cause:
1.  9.1GB off-heap memory when write 7.5TB data .
2.  3000 threads when write 7.5TB data.
3. decrease performance when write 750TB data and cache miss frequently.

Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 


was (Author: yjxxtd):
In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger. The off-heap memory can not be 
gc when rocksdb instance is alive.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Besides, open rocksdb cost about 300ms, the capacity of 1500 cached container 
is only 7.5TB, if there is 750TB data in one datanode,  cache miss will happen 
frequently,  cache miss cause open new rocksdb, so too many rocksdb also 
decrease performance.
 
Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also 

[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode

2020-08-30 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908
 ] 

runzhiwang edited comment on HDDS-3630 at 8/31/20, 12:49 AM:
-

In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger. The off-heap memory can not be 
gc when rocksdb instance is alive.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Besides, open rocksdb cost about 300ms, the capacity of 1500 cached container 
is only 7.5TB, if there is 750TB data in one datanode,  cache miss will happen 
frequently,  cache miss cause open new rocksdb, so too many rocksdb also 
decrease performance.
 
Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 


was (Author: yjxxtd):
In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger. The off-heap memory can not be 
gc when rocksdb instance is alive.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like 

[jira] [Updated] (HDDS-4138) Improve crc efficiency

2020-08-24 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4138:
-
Description: 
HADOOP has implemented several method to calculate crc: 
https://issues.apache.org/jira/browse/HADOOP-15033
We should choose the method with high efficiency.

This flame graph is from [~elek]
 !screenshot-1.png! 

  was:
HADOOP has implemented several method to calculate crc: 
https://issues.apache.org/jira/browse/HADOOP-15033
We should choose the method with high efficiency.



> Improve crc efficiency
> --
>
> Key: HDDS-4138
> URL: https://issues.apache.org/jira/browse/HDDS-4138
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> HADOOP has implemented several method to calculate crc: 
> https://issues.apache.org/jira/browse/HADOOP-15033
> We should choose the method with high efficiency.
> This flame graph is from [~elek]
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4138) Improve crc efficiency

2020-08-24 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4138:
-
Attachment: screenshot-1.png

> Improve crc efficiency
> --
>
> Key: HDDS-4138
> URL: https://issues.apache.org/jira/browse/HDDS-4138
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> HADOOP has implemented several method to calculate crc: 
> https://issues.apache.org/jira/browse/HADOOP-15033
> We should choose the method with high efficiency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4138) Improve crc efficiency

2020-08-24 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4138:
-
Description: 
HADOOP has implemented several method to calculate crc: 
https://issues.apache.org/jira/browse/HADOOP-15033
We should choose the method with high efficiency.


  was:
HADOOP has implemented several method to calculate crc: 
https://issues.apache.org/jira/browse/HADOOP-15033
We should choose the method with high efficiency.


> Improve crc efficiency
> --
>
> Key: HDDS-4138
> URL: https://issues.apache.org/jira/browse/HDDS-4138
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>
> HADOOP has implemented several method to calculate crc: 
> https://issues.apache.org/jira/browse/HADOOP-15033
> We should choose the method with high efficiency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4138) Improve crc efficiency

2020-08-24 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-4138:
-
Description: 
HADOOP has implemented several method to calculate crc: 
https://issues.apache.org/jira/browse/HADOOP-15033
We should choose the method with high efficiency.

> Improve crc efficiency
> --
>
> Key: HDDS-4138
> URL: https://issues.apache.org/jira/browse/HDDS-4138
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>
> HADOOP has implemented several method to calculate crc: 
> https://issues.apache.org/jira/browse/HADOOP-15033
> We should choose the method with high efficiency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4138) Improve crc efficiency

2020-08-24 Thread runzhiwang (Jira)
runzhiwang created HDDS-4138:


 Summary: Improve crc efficiency
 Key: HDDS-4138
 URL: https://issues.apache.org/jira/browse/HDDS-4138
 Project: Hadoop Distributed Data Store
  Issue Type: Task
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode

2020-08-23 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908
 ] 

runzhiwang edited comment on HDDS-3630 at 8/24/20, 3:00 AM:


In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger. The off-heap memory can not be 
gc when rocksdb instance is alive.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 


was (Author: yjxxtd):
In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 

> Merge rocksdb in datanode
> -
>
> Key: HDDS-3630
> URL: https://issues.apache.org/jira/browse/HDDS-3630
> Project: Hadoop Distributed Data Store
>  Issue Type: 

[jira] [Comment Edited] (HDDS-3630) Merge rocksdb in datanode

2020-08-23 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908
 ] 

runzhiwang edited comment on HDDS-3630 at 8/24/20, 2:56 AM:


In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Except merging rocksdb, there are 2 other options which both have cons, so I 
prefer merging rocksdb.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 


was (Author: yjxxtd):
In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Except merging rocksdb, there are 2 other options which both have cons.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 

> Merge rocksdb in datanode
> -
>
> Key: HDDS-3630
> URL: https://issues.apache.org/jira/browse/HDDS-3630
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: 

[jira] [Commented] (HDDS-3630) Merge rocksdb in datanode

2020-08-23 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182908#comment-17182908
 ] 

runzhiwang commented on HDDS-3630:
--

In my test, the capacity of the container is 5GB,  ozone.container.cache.size 
is 1500, about 25,000 blocks in each container. So there are 1500 RocksDB 
instances in memory when one datanode writes 7.5TB data. The basic settings of 
memory: -Xmx3500m, -XX:MaxDirectMemorySize=1000m.

After one datanode writting 7.5GB data, Resident Memory is 13.6GB, Virtual 
Memory is 53.5GB, so off-heap memory of RocksDB is about 9.1GB. If the block 
size is smaller,  the off-heap memory is bigger.

There are 4835 threads in the datanode, and 3000 RocksDB’s threads. There are 
1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

So the off-heap memory of rocksdb consists of:
1. block cache of rocksdb.
2. 1500 rocksdb:dump_st threads, and 1500 rocksdb:pst_st threads.

Except merging rocksdb, there are 2 other options which both have cons.
1.  flush the rocksdb memory of closed container.
 cons: thousands of rocksdb threads still exist.
2.  remove rocksdb in datanode, and store the data in file.
 cons: 
  a. It really needs a big work to remove rocksdb
  b. If we store all the checksum in file, in order to query checksum 
faster, we must load the file into memory,  because the file is for each 
container, when the number of closed container increase, the file number also 
increase, we can not load all the file of all closed container into memory, 
otherwise OOM will happen.  we must maintain an elimination strategy for the 
checksum file in memory such as LRU. It looks like we do the work which can be 
done by rocksdb.
c. For open container, there is also some data needed to store in file and 
update frequently, we can not force sync to disk every time when update 
happens, because some data is not important such as block count in each 
container. So we must create a background thread to do a batch sync for these 
types of data, it looks complicated. Everytime when we add these type of data, 
we must do the duplicated work, the code may be hard to maintain.
d. Reduce memory can be achieved by merging rocksdb, it looks like more 
easier than removing it, so maybe we need not spend so much time removing it.
 

> Merge rocksdb in datanode
> -
>
> Key: HDDS-3630
> URL: https://issues.apache.org/jira/browse/HDDS-3630
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: Merge RocksDB in Datanode-v1.pdf, Merge RocksDB in 
> Datanode-v2.pdf
>
>
> Currently, one rocksdb for one container. one container has 5GB capacity. 
> 10TB data need more than 2000 rocksdb in one datanode.  It's difficult to 
> limit the memory of 2000 rocksdb. So maybe we should limited instance of 
> rocksdb for each disk.
> The design of improvement is in the follow link, but still is a draft. 
> TODO: 
>  1. compatibility with current logic i.e. one rocksdb for each container
>  2. measure the memory usage before and after improvement
>  3. effect on efficiency of read and write.
> https://docs.google.com/document/d/18Ybg-NjyU602c-MYXaJHP6yrg-dVMZKGyoK5C_pp1mM/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4024) Avoid while loop too soon when exception happen

2020-07-24 Thread runzhiwang (Jira)
runzhiwang created HDDS-4024:


 Summary: Avoid while loop too soon when exception happen
 Key: HDDS-4024
 URL: https://issues.apache.org/jira/browse/HDDS-4024
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock

2020-07-17 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3978:
-
Description: 
We met dead lock related to log4j, the jstack information has been attached.  
For the following two reasons, I want to switch log4j in ozone and ratis to 
log4j2.

1. There are a lot of dead lock report in log4j:
https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/

2. And log4j2 is better than log4j.
https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j

Besides log4j and log4j2 both exist in ozone, audit log use log4j2, and other 
log use log4j, maybe it's time to unify them.


  was:
We met dead lock related to log4j, the jstack information has been attached.  
For the following two reasons, I want to switch log4j in ozone and ratis to 
log4j2.

1. There are a lot of dead lock report in log4j:
https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/

2. And log4j2 is better than log4j.
https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j



> Switch log4j to log4j2 to avoid deadlock
> 
>
> Key: HDDS-3978
> URL: https://issues.apache.org/jira/browse/HDDS-3978
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack-deadlock-log.txt
>
>
> We met dead lock related to log4j, the jstack information has been attached.  
> For the following two reasons, I want to switch log4j in ozone and ratis to 
> log4j2.
> 1. There are a lot of dead lock report in log4j:
> https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/
> 2. And log4j2 is better than log4j.
> https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j
> Besides log4j and log4j2 both exist in ozone, audit log use log4j2, and other 
> log use log4j, maybe it's time to unify them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock

2020-07-17 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3978:
-
Description: 
We met dead lock related to log4j, the jstack information has been attached.  
For the following two reasons, I want to switch log4j in ozone and ratis to 
log4j2.

1. There are a lot of dead lock report in log4j:
https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/

2. And log4j2 is better than log4j.
https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j


  was:We met dead lock related to log4j, the jstack information has been 
attached. 


> Switch log4j to log4j2 to avoid deadlock
> 
>
> Key: HDDS-3978
> URL: https://issues.apache.org/jira/browse/HDDS-3978
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack-deadlock-log.txt
>
>
> We met dead lock related to log4j, the jstack information has been attached.  
> For the following two reasons, I want to switch log4j in ozone and ratis to 
> log4j2.
> 1. There are a lot of dead lock report in log4j:
> https://stackoverflow.com/questions/3537870/production-settings-file-for-log4j/
> 2. And log4j2 is better than log4j.
> https://stackoverflow.com/questions/30019585/log4j2-why-would-you-use-it-over-log4j



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock

2020-07-17 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3978:
-
Description: We met dead lock related to log4j, the jstack information has 
been attached.   (was: We met dead lock related to log4j, the jstack 
information has been attached.)

> Switch log4j to log4j2 to avoid deadlock
> 
>
> Key: HDDS-3978
> URL: https://issues.apache.org/jira/browse/HDDS-3978
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack-deadlock-log.txt
>
>
> We met dead lock related to log4j, the jstack information has been attached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock

2020-07-17 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3978:
-
Description: We met dead lock related to log4j, the jstack information has 
been attached.

> Switch log4j to log4j2 to avoid deadlock
> 
>
> Key: HDDS-3978
> URL: https://issues.apache.org/jira/browse/HDDS-3978
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack-deadlock-log.txt
>
>
> We met dead lock related to log4j, the jstack information has been attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock

2020-07-17 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3978:
-
Attachment: jstack-deadlock-log.txt

> Switch log4j to log4j2 to avoid deadlock
> 
>
> Key: HDDS-3978
> URL: https://issues.apache.org/jira/browse/HDDS-3978
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack-deadlock-log.txt
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3978) Switch log4j to log4j2 to avoid deadlock

2020-07-17 Thread runzhiwang (Jira)
runzhiwang created HDDS-3978:


 Summary: Switch log4j to log4j2 to avoid deadlock
 Key: HDDS-3978
 URL: https://issues.apache.org/jira/browse/HDDS-3978
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3952) Merge small container

2020-07-16 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3952:
-
Summary: Merge small container  (was: Make container could be reopened)

> Merge small container
> -
>
> Key: HDDS-3952
> URL: https://issues.apache.org/jira/browse/HDDS-3952
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.7.0
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3952) Make container could be reopened

2020-07-16 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636
 ] 

runzhiwang edited comment on HDDS-3952 at 7/17/20, 3:02 AM:


[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small containers which are not full but closed, as the 
image shows, because ratis pipeline is not stable. So we want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes. Then put the containers of map.get(pipeline.datanodes()) on the 
pipeline and reopen them.

[~msingh] What do you think ?

 !screenshot-1.png! 


was (Author: yjxxtd):
[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small containers which are not full but closed, as the 
image shows, because ratis pipeline is not stable. So we want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes.

[~msingh] What do you think ?

 !screenshot-1.png! 

> Make container could be reopened
> 
>
> Key: HDDS-3952
> URL: https://issues.apache.org/jira/browse/HDDS-3952
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.7.0
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3952) Make container could be reopened

2020-07-16 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636
 ] 

runzhiwang edited comment on HDDS-3952 at 7/17/20, 3:01 AM:


[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small containers which are not full but closed, as the 
image shows, because ratis pipeline is not stable. So we want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes.

[~msingh] What do you think ?

 !screenshot-1.png! 


was (Author: yjxxtd):
[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small containers which are not full but closed, as the 
image shows, because ratis pipeline is not stable. So We want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes.

[~msingh] What do you think ?

 !screenshot-1.png! 

> Make container could be reopened
> 
>
> Key: HDDS-3952
> URL: https://issues.apache.org/jira/browse/HDDS-3952
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.7.0
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3952) Make container could be reopened

2020-07-16 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636
 ] 

runzhiwang edited comment on HDDS-3952 at 7/17/20, 3:00 AM:


[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small containers which are not full but closed, as the 
image shows, because ratis pipeline is not stable. So We want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes.

[~msingh] What do you think ?

 !screenshot-1.png! 


was (Author: yjxxtd):
[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small container which is not full but closed, as the 
image shows, because ratis pipeline is not stable. So We want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes.

[~msingh] What do you think ?

 !screenshot-1.png! 

> Make container could be reopened
> 
>
> Key: HDDS-3952
> URL: https://issues.apache.org/jira/browse/HDDS-3952
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.7.0
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3952) Make container could be reopened

2020-07-16 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17159636#comment-17159636
 ] 

runzhiwang commented on HDDS-3952:
--

[~msingh] Hi, Thanks for review. The backgroud of this jira is our production 
cluster has  about 7000 small container which is not full but closed, as the 
image shows, because ratis pipeline is not stable. So We want to reopen and 
write to the closed but not full container.
The basic idea:
1. SCM build a map with entry <3 datanodes, set of closed but not full 
containers on the 3 datanodes>
2. SCM check whether any open pipeline locate on the 3 datanodes from the 
map.entrySet, if exists such open pipeline, we get the closed but not full
containers by map.get(pipeline.datanodes()), put them on the pipeline and 
reopen them.
3. When SCM create new pipeline, we first select from the map which 3 datanodes 
has the most closed but not full containers, and create pipeline on this 3 
datanodes.

[~msingh] What do you think ?

 !screenshot-1.png! 

> Make container could be reopened
> 
>
> Key: HDDS-3952
> URL: https://issues.apache.org/jira/browse/HDDS-3952
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.7.0
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3952) Make container could be reopened

2020-07-16 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3952:
-
Attachment: screenshot-1.png

> Make container could be reopened
> 
>
> Key: HDDS-3952
> URL: https://issues.apache.org/jira/browse/HDDS-3952
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.7.0
>Reporter: maobaolong
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3941) Enable core dump when crash in C++

2020-07-16 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang resolved HDDS-3941.
--
Fix Version/s: 0.6.0
   Resolution: Fixed

> Enable core dump when crash in C++
> --
>
> Key: HDDS-3941
> URL: https://issues.apache.org/jira/browse/HDDS-3941
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3957) Fix mixed use of Longs.toByteArray and Ints.fromByteArray

2020-07-13 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3957:
-
Summary: Fix mixed use of Longs.toByteArray and Ints.fromByteArray  (was: 
Fix error use Longs.toByteArray and Ints.fromByteArray of 
DB_PENDING_DELETE_BLOCK_COUNT_KEY)

> Fix mixed use of Longs.toByteArray and Ints.fromByteArray
> -
>
> Key: HDDS-3957
> URL: https://issues.apache.org/jira/browse/HDDS-3957
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3957) Fix error use Longs.toByteArray and Ints.fromByteArray of DB_PENDING_DELETE_BLOCK_COUNT_KEY

2020-07-13 Thread runzhiwang (Jira)
runzhiwang created HDDS-3957:


 Summary: Fix error use Longs.toByteArray and Ints.fromByteArray of 
DB_PENDING_DELETE_BLOCK_COUNT_KEY
 Key: HDDS-3957
 URL: https://issues.apache.org/jira/browse/HDDS-3957
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Fix memory leak because of too many Datanode State Machine Thread

2020-07-09 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Summary: Fix memory leak because of too many Datanode State Machine Thread  
(was: memory leak because of too many Datanode State Machine Thread)

> Fix memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack.txt, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 
>  !screenshot-2.png! 
>  !screenshot-3.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3223) Improve s3g read 1GB object efficiency by 100 times

2020-07-09 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3223:
-
Summary: Improve s3g read 1GB object efficiency by 100 times   (was: 
Improve s3g read 1GB object efficiency by 10 times )

> Improve s3g read 1GB object efficiency by 100 times 
> 
>
> Key: HDDS-3223
> URL: https://issues.apache.org/jira/browse/HDDS-3223
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.6.0
>
> Attachments: screenshot-1.png
>
>
> *What's the problem ?*
> Read 1000M object, it cost about 470 seconds, i.e. 2.2M/s, which is too slow. 
> *What's the reason ?*
> When read 1000M file, there are 50 GET requests, each GET request read 20M. 
> When do GET, the stack is: 
> [IOUtils::copyLarge|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/s3gateway/src/main/java/org/apache/hadoop/ozone/s3/endpoint/ObjectEndpoint.java#L262]
>  -> 
> [IOUtils::skipFully|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1190]
>  -> 
> [IOUtils::skip|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L2064]
>  -> 
> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957].
> It means, the 50th GET request which should read 980M-1000M, but to skip 
> 0-980M, it also 
> [InputStream::read|https://github.com/apache/commons-io/blob/master/src/main/java/org/apache/commons/io/IOUtils.java#L1957]
>  0-980M. So the 1st GET request read 0-20M, the 2nd GET request read 0-40M, 
> the 3rd GET request read 0-60M, ..., the 50th GET request read 0-1000M. So 
> the GET  request from 1st-50th become slower and slower.
> You can also refer it [here|https://issues.apache.org/jira/browse/IO-203] why 
> IOUtils implement skip by read rather than real skip, e.g. seek.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3941) Enable core dump when crash in C++

2020-07-08 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3941:
-
Summary: Enable core dump when crash in C++  (was: Set core file size to 
debug when crash in C++)

> Enable core dump when crash in C++
> --
>
> Key: HDDS-3941
> URL: https://issues.apache.org/jira/browse/HDDS-3941
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) memory leak because of too many Datanode State Machine Thread

2020-07-08 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Summary: memory leak because of too many Datanode State Machine Thread  
(was: Memory leak because of too many Datanode State Machine Thread)

> memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack.txt, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 
>  !screenshot-2.png! 
>  !screenshot-3.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Attachment: jstack.txt

> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: jstack.txt, screenshot-1.png, screenshot-2.png, 
> screenshot-3.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 
>  !screenshot-2.png! 
>  !screenshot-3.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Description: 
When create 22345th  Datanode State Machine Thread, OOM happened.
!screenshot-1.png! 
 !screenshot-2.png! 
 !screenshot-3.png! 

  was:
When create 22345th  Datanode State Machine Thread, OOM happened.
!screenshot-1.png! 
 !screenshot-2.png! 


> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 
>  !screenshot-2.png! 
>  !screenshot-3.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Attachment: screenshot-3.png

> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3853) Container marked as missing on datanode while container directory do exist

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang reassigned HDDS-3853:


Assignee: runzhiwang  (was: Shashikant Banerjee)

> Container marked as missing on datanode while container directory do exist
> --
>
> Key: HDDS-3853
> URL: https://issues.apache.org/jira/browse/HDDS-3853
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: runzhiwang
>Priority: Major
>
> {code}
> INFO org.apache.hadoop.ozone.container.common.impl.HddsDispatcher: Operation: 
> PutBlock , Trace ID: 487c959563e884b9:509a3386ba37abc6:487c959563e884b9:0 , 
> Message: ContainerID 1744 has been lost and and cannot be recreated on this 
> DataNode , Result: CONTAINER_MISSING , StorageContainerException Occurred.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  ContainerID 1744 has been lost and and cannot be recreated on this DataNode
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:238)
> at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:166)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:395)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:405)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:749)
> at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine:
>  gid group-1376E41FD581 : ApplyTransaction failed. cmd PutBlock logIndex 
> 40079 msg : ContainerID 1744 has been lost and and cannot be recreated on 
> this DataNode Container Result: CONTAINER_MISSING
>  ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE on pipeline 
> PipelineID=de21dfcf-415c-4901-84ca-1376e41fd581.Reason : Ratis Transaction 
> failure in datanode 33b49c34-caa2-4b4f-894e-dce7db4f97b9 with role FOLLOWER 
> .Triggering pipeline close action
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Description: 
When create 22345th  Datanode State Machine Thread, OOM happened.
!screenshot-1.png! 
 !screenshot-2.png! 

  was:
When create 22345th  Datanode State Machine Thread, OOM happened.
!screenshot-1.png! 


> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 
>  !screenshot-2.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Attachment: screenshot-2.png

> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Description: 
When create 22345th  Datanode State Machine Thread, OOM happened.
!screenshot-1.png! 

> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When create 22345th  Datanode State Machine Thread, OOM happened.
> !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3933:
-
Attachment: screenshot-1.png

> Memory leak because of too many Datanode State Machine Thread
> -
>
> Key: HDDS-3933
> URL: https://issues.apache.org/jira/browse/HDDS-3933
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3933) Memory leak because of too many Datanode State Machine Thread

2020-07-07 Thread runzhiwang (Jira)
runzhiwang created HDDS-3933:


 Summary: Memory leak because of too many Datanode State Machine 
Thread
 Key: HDDS-3933
 URL: https://issues.apache.org/jira/browse/HDDS-3933
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2922) Recommend leader host to Ratis via pipeline creation

2020-07-03 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang reassigned HDDS-2922:


Assignee: runzhiwang  (was: Li Cheng)

> Recommend leader host to Ratis via pipeline creation
> 
>
> Key: HDDS-2922
> URL: https://issues.apache.org/jira/browse/HDDS-2922
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Li Cheng
>Assignee: runzhiwang
>Priority: Major
>
> Ozone should be able to recommend leader host to Ratis via pipeline creation. 
> The leader host can be recommended based on rack awareness and load balance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2922) Recommend leader host to Ratis via pipeline creation

2020-07-03 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150886#comment-17150886
 ] 

runzhiwang commented on HDDS-2922:
--

I'm working on it

> Recommend leader host to Ratis via pipeline creation
> 
>
> Key: HDDS-2922
> URL: https://issues.apache.org/jira/browse/HDDS-2922
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Li Cheng
>Assignee: runzhiwang
>Priority: Major
>
> Ozone should be able to recommend leader host to Ratis via pipeline creation. 
> The leader host can be recommended based on rack awareness and load balance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3899) Avoid change state from closing to exception in LogAppender

2020-06-29 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang resolved HDDS-3899.
--
Resolution: Invalid

> Avoid change state from closing to exception in LogAppender
> ---
>
> Key: HDDS-3899
> URL: https://issues.apache.org/jira/browse/HDDS-3899
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3899) Avoid change state from closing to exception in LogAppender

2020-06-29 Thread runzhiwang (Jira)
runzhiwang created HDDS-3899:


 Summary: Avoid change state from closing to exception in 
LogAppender
 Key: HDDS-3899
 URL: https://issues.apache.org/jira/browse/HDDS-3899
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: runzhiwang
Assignee: runzhiwang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3861) Fix handlePipelineFailure throw exception if role is follower

2020-06-24 Thread runzhiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

runzhiwang updated HDDS-3861:
-
Description:  !screenshot-1.png! 

> Fix handlePipelineFailure throw exception if role is follower
> -
>
> Key: HDDS-3861
> URL: https://issues.apache.org/jira/browse/HDDS-3861
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   >