from:"Shashikant Banerjee \(JIRA\)"

[jira] [Updated] (HDDS-887) Add DispatcherContext info to Dispatcher from containerStateMachine

2018-12-01 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-887:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add DispatcherContext info to Dispatcher from containerStateMachine
> ---
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch, HDDS-887.001.patch, 
> HDDS-887.002.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-887) Add DispatcherContext info to Dispatcher from containerStateMachine

2018-12-01 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-887:
-
Summary: Add DispatcherContext info to Dispatcher from 
containerStateMachine  (was: Add StatemachineContext info to Dispatcher from 
containerStateMachine)

> Add DispatcherContext info to Dispatcher from containerStateMachine
> ---
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch, HDDS-887.001.patch, 
> HDDS-887.002.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-882) Provide a config to optionally turn on/off the sync flag during chunk writes

2018-11-30 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705582#comment-16705582
 ] 

Shashikant Banerjee commented on HDDS-882:
--

Test failures are not related and are tracked by HDDS-885.

> Provide a config to optionally turn on/off the sync flag during chunk writes
> 
>
> Key: HDDS-882
> URL: https://issues.apache.org/jira/browse/HDDS-882
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: HDDS-882.000.patch, HDDS-882.001.patch
>
>
> Currently, chunk writes happen with sync flag on. We should add a config to 
> enable/disable this for performance benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705581#comment-16705581
 ] 

Shashikant Banerjee commented on HDDS-887:
--

Patch v2 fixes some checkstyle and some related test failures. Other test 
failures are not related.

> Add StatemachineContext info to Dispatcher from containerStateMachine
> -
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch, HDDS-887.001.patch, 
> HDDS-887.002.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-887:
-
Attachment: HDDS-887.002.patch

> Add StatemachineContext info to Dispatcher from containerStateMachine
> -
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch, HDDS-887.001.patch, 
> HDDS-887.002.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705192#comment-16705192
 ] 

Shashikant Banerjee commented on HDDS-887:
--

Thanks [~jnp] for the review. Addressed the review comments as well as 
checkstyle issues in the v2 patch. Test failures are caused by HDDS-284 and 
tracked by HDDS-885.

> Add StatemachineContext info to Dispatcher from containerStateMachine
> -
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch, HDDS-887.001.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-887:
-
Attachment: HDDS-887.001.patch

> Add StatemachineContext info to Dispatcher from containerStateMachine
> -
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch, HDDS-887.001.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-887:
-
Attachment: HDDS-887.000.patch

> Add StatemachineContext info to Dispatcher from containerStateMachine
> -
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-887:
-
Status: Patch Available  (was: Open)

> Add StatemachineContext info to Dispatcher from containerStateMachine
> -
>
> Key: HDDS-887
> URL: https://issues.apache.org/jira/browse/HDDS-887
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-887.000.patch
>
>
> As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
> some specific info set which is required for executing the transactions on 
> the HddsDispatcher. Right now, all these protocol specfic info is added as 
> part of ContainerCommandRequestProto object which is visible to client. This 
> Jira aims to add the protocol specfic info in a context object and pass it to 
> dispatcher and remove the visibility from clinet by removing it out of 
> ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-887) Add StatemachineContext info to Dispatcher from containerStateMachine

2018-11-30 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-887:


 Summary: Add StatemachineContext info to Dispatcher from 
containerStateMachine
 Key: HDDS-887
 URL: https://issues.apache.org/jira/browse/HDDS-887
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


As a part of transaction like writeChunk, readChunk, putBlock etc, there are 
some specific info set which is required for executing the transactions on the 
HddsDispatcher. Right now, all these protocol specfic info is added as part of 
ContainerCommandRequestProto object which is visible to client. This Jira aims 
to add the protocol specfic info in a context object and pass it to dispatcher 
and remove the visibility from clinet by removing it out of 
ContainerCommandRequestProto. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-882) Provide a config to optionally turn on/off the sync flag during chunk writes

2018-11-30 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704538#comment-16704538
 ] 

Shashikant Banerjee commented on HDDS-882:
--

Patch v1 fix the checkstyle issue.

> Provide a config to optionally turn on/off the sync flag during chunk writes
> 
>
> Key: HDDS-882
> URL: https://issues.apache.org/jira/browse/HDDS-882
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: HDDS-882.000.patch, HDDS-882.001.patch
>
>
> Currently, chunk writes happen with sync flag on. We should add a config to 
> enable/disable this for performance benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-882) Provide a config to optionally turn on/off the sync flag during chunk writes

2018-11-30 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-882:
-
Attachment: HDDS-882.001.patch

> Provide a config to optionally turn on/off the sync flag during chunk writes
> 
>
> Key: HDDS-882
> URL: https://issues.apache.org/jira/browse/HDDS-882
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: HDDS-882.000.patch, HDDS-882.001.patch
>
>
> Currently, chunk writes happen with sync flag on. We should add a config to 
> enable/disable this for performance benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-29 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704222#comment-16704222
 ] 

Shashikant Banerjee commented on HDDS-870:
--

Patch v3 fixes some checkstyle and some unintended changes introduced with 
patch v2.

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch, HDDS-870.003.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-29 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-870:
-
Attachment: HDDS-870.003.patch

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch, HDDS-870.003.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-29 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704220#comment-16704220
 ] 

Shashikant Banerjee edited comment on HDDS-870 at 11/30/18 3:38 AM:


Thanks [~jnp] for the review. 
{code:java}
It seems error prone to create a bufferList in ChunkGroupOutputStream and share 
it in various ChunkOutputStreams within. The two streams may start working on 
same buffer?
{code}
Once one ChunkOutpuStream closes, we start writing the next ChunkOutputstream. 
There can be no possibility of two underlying streams to act on the same 
buffers concurrently.

In case, an exception is encountered which needs to be handled, the data 
residing in the buffer has to be moved to the next stream in list containing a 
different block. In such cases, data has to be shared among underlying 
streams.So, It seems to be making more sense to maintain it in 
ChunkGroupOutPutStream rather than each ChunkOutputStream. 

The allocation of buffers has been moved to ChunkPutStream so that, the buffers 
get allocated only when write is requested, Otherwise, for an empty key, there 
will be no allocation of buffers.


was (Author: shashikant):
Thanks [~jnp] for the review. 
{code:java}
It seems error prone to create a bufferList in ChunkGroupOutputStream and share 
it in various ChunkOutputStreams within. The two streams may start working on 
same buffer?
{code}
Once one ChunkOutpuStream closes, we start writing the next ChunkOutputstream. 
There can be no possibility of two underlying streams to act on the same 
buffers concurrently.

In case, an exception is encountered which needs to be handled, the data 
residing in the buffer has to be moved to the next stream in list containing a 
different block. In such cases, data has ton be shared among underlying 
streams.So, It seems to be making more sense to maintain it in 
ChunkGroupOutPutStream rather than each ChunkOutputStream. 

The allocation of buffers has been moved to ChunkPutStream so that, the buffers 
get allocated only when write is requested, Otherwise, for an empty key, there 
will be no allocation of buffers.

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-29 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704220#comment-16704220
 ] 

Shashikant Banerjee commented on HDDS-870:
--

Thanks [~jnp] for the review. 
{code:java}
It seems error prone to create a bufferList in ChunkGroupOutputStream and share 
it in various ChunkOutputStreams within. The two streams may start working on 
same buffer?
{code}
Once one ChunkOutpuStream closes, we start writing the next ChunkOutputstream. 
There can be no possibility of two underlying streams to act on the same 
buffers concurrently.

In case, an exception is encountered which needs to be handled, the data 
residing in the buffer has to be moved to the next stream in list containing a 
different block. In such cases, data has ton be shared among underlying 
streams.So, It seems to be making more sense to maintain it in 
ChunkGroupOutPutStream rather than each ChunkOutputStream. 

The allocation of buffers has been moved to ChunkPutStream so that, the buffers 
get allocated only when write is requested, Otherwise, for an empty key, there 
will be no allocation of buffers.

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-883) Chunk writes should validate the checksum before overwriting the existing chunk file

2018-11-29 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-883:


 Summary: Chunk writes should validate the checksum before 
overwriting the existing chunk file
 Key: HDDS-883
 URL: https://issues.apache.org/jira/browse/HDDS-883
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


When writeChunk request arrives via Ratis, it may happen that the tmp chunk 
file already exist because of ratis retry on stateMachine timeout during 
writeStateMachineData phase. In such cases, we can just validate the length and 
checksum if present before blindly overwriting it and return. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-876) add blockade tests for flaky network

2018-11-29 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703550#comment-16703550
 ] 

Shashikant Banerjee commented on HDDS-876:
--

+1.

> add blockade tests for flaky network
> 
>
> Key: HDDS-876
> URL: https://issues.apache.org/jira/browse/HDDS-876
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-876.001.patch, HDDS-876.002.patch, 
> HDDS-876.003.patch
>
>
> Blockade is a container utility to simulate network and node failures and 
> network partitions. https://blockade.readthedocs.io/en/latest/guide.html.
> This jira proposes to add a simple test to test freon with a flaky network.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-29 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~msingh] and [~jnp] for the review comments. I have committed this 
change to trunk.

I will open a separate Jira for changes in Dispatcher and the KeyValueHandler 
as suggested by [~jnp].

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch, HDDS-850.003.patch, HDDS-850.004.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-882) Provide a config to optionally turn on/off the sync flag during chunk writes

2018-11-29 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-882:
-
Status: Patch Available  (was: Open)

> Provide a config to optionally turn on/off the sync flag during chunk writes
> 
>
> Key: HDDS-882
> URL: https://issues.apache.org/jira/browse/HDDS-882
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: HDDS-882.000.patch
>
>
> Currently, chunk writes happen with sync flag on. We should add a config to 
> enable/disable this for performance benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-882) Provide a config to optionally turn on/off the sync flag during chunk writes

2018-11-29 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-882:
-
Attachment: HDDS-882.000.patch

> Provide a config to optionally turn on/off the sync flag during chunk writes
> 
>
> Key: HDDS-882
> URL: https://issues.apache.org/jira/browse/HDDS-882
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: HDDS-882.000.patch
>
>
> Currently, chunk writes happen with sync flag on. We should add a config to 
> enable/disable this for performance benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-882) Provide a config to optionally turn on/off the sync flag during chunk writes

2018-11-29 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-882:


 Summary: Provide a config to optionally turn on/off the sync flag 
during chunk writes
 Key: HDDS-882
 URL: https://issues.apache.org/jira/browse/HDDS-882
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


Currently, chunk writes happen with sync flag on. We should add a config to 
enable/disable this for performance benchmarks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-28 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702790#comment-16702790
 ] 

Shashikant Banerjee commented on HDDS-870:
--

Patch v2: rebased to latest trunk.

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-28 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-870:
-
Attachment: HDDS-870.002.patch

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-28 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702782#comment-16702782
 ] 

Shashikant Banerjee commented on HDDS-850:
--

Patch v4 rebased to latest trunk.

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch, HDDS-850.003.patch, HDDS-850.004.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-28 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Attachment: HDDS-850.004.patch

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch, HDDS-850.003.patch, HDDS-850.004.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-876) add blockade tests for flaky network

2018-11-28 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702700#comment-16702700
 ] 

Shashikant Banerjee commented on HDDS-876:
--

Thanks [~msingh] for the patch. The blockade.py file missing the ASF license 
header.

The patch looks good to me otherwise, +1.

> add blockade tests for flaky network
> 
>
> Key: HDDS-876
> URL: https://issues.apache.org/jira/browse/HDDS-876
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-876.001.patch
>
>
> Blockade is a container utility to simulate network and node failures and 
> network partitions. https://blockade.readthedocs.io/en/latest/guide.html.
> This jira proposes to add a simple test to test freon with a flaky network.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-28 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701966#comment-16701966
 ] 

Shashikant Banerjee commented on HDDS-870:
--

Thanks [~jnp] for the review. Patch v1 fixes some bugs and adds more tests as 
well addresses your review comments by removing the atomic reference on 
lastFlushIndex itself.

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-28 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-870:
-
Attachment: HDDS-870.001.patch

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-23 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Attachment: HDDS-850.003.patch

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch, HDDS-850.003.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-23 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697459#comment-16697459
 ] 

Shashikant Banerjee commented on HDDS-850:
--

Thanks [~msingh] for the review comments. Patch v3 addresses your review 
comments.

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch, HDDS-850.003.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-23 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-870:
-
Status: Patch Available  (was: Open)

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-23 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-870:
-
Attachment: HDDS-870.000.patch

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-23 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696570#comment-16696570
 ] 

Shashikant Banerjee commented on HDDS-850:
--

Thanks [~msingh] for the review. Patch v2 addresses your review comments as 
well as checkstyle issues.

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-23 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Attachment: HDDS-850.002.patch

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch, 
> HDDS-850.002.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDDS-854) TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky

2018-11-22 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-854.
--
   Resolution: Fixed
Fix Version/s: 0.4.0

Fixed along with HDDS-866.

> TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky
> ---
>
> Key: HDDS-854
> URL: https://issues.apache.org/jira/browse/HDDS-854
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky. It 
> times out while waiting for the mini cluster datanode to restart
> {code}
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:122)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:276)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:283)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures(TestFailureHandlingByClient.java:200)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-11-22 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-870:


 Summary: Avoid creating block sized buffer in 
ChunkGroupOutputStream
 Key: HDDS-870
 URL: https://issues.apache.org/jira/browse/HDDS-870
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


Currently, for a key, we create a block size byteBuffer in order for caching 
data. This can be replaced with an array of buffers of size flush buffer size 
configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-869) Fix log message in XceiverClientRatis#sendCommandAsync

2018-11-22 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-869:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~ljain] for the contribution. I have fixed the checkstyle issue and 
committed this change to trunk.

> Fix log message in XceiverClientRatis#sendCommandAsync
> --
>
> Key: HDDS-869
> URL: https://issues.apache.org/jira/browse/HDDS-869
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-869.001.patch
>
>
> The log message in XceiverClientRatis#sendCommandAsync is wrong. We should 
> not print data in case of write chunk request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-869) Fix log message in XceiverClientRatis#sendCommandAsync

2018-11-22 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695876#comment-16695876
 ] 

Shashikant Banerjee commented on HDDS-869:
--

Thanks [~ljain] for working on this. +1

> Fix log message in XceiverClientRatis#sendCommandAsync
> --
>
> Key: HDDS-869
> URL: https://issues.apache.org/jira/browse/HDDS-869
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-869.001.patch
>
>
> The log message in XceiverClientRatis#sendCommandAsync is wrong. We should 
> not print data in case of write chunk request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-22 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695780#comment-16695780
 ] 

Shashikant Banerjee commented on HDDS-850:
--

Patch v1 rebased to latest.

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-22 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Attachment: HDDS-850.001.patch

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-22 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Status: Patch Available  (was: Open)

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch, HDDS-850.001.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695611#comment-16695611
 ] 

Shashikant Banerjee commented on HDDS-866:
--

Thanks [~jnp] for the review.
{code:java}
XceiverServerRatis#isExist has unhandled checked exception. It might not compile
{code}
With latest ratis Version , server.getGroupIds() does not throw any exception. 
So , we need to remove that. I tried the build locally nad it works for me.

The other comment is addressed in the v1 patch.

 

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch, HDDS-866.001.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Attachment: HDDS-866.001.patch

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch, HDDS-866.001.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Attachment: HDDS-866.000.patch

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Status: Patch Available  (was: Open)

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Status: Open  (was: Patch Available)

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Attachment: (was: HDDS-866.000.patch)

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695579#comment-16695579
 ] 

Shashikant Banerjee commented on HDDS-866:
--

Patch v0 updates ozone to latest Ratis Snapshot as well.

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Status: Patch Available  (was: Open)

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-866:
-
Attachment: HDDS-866.000.patch

> Handle RaftRetryFailureException in OzoneClient
> ---
>
> Key: HDDS-866
> URL: https://issues.apache.org/jira/browse/HDDS-866
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-866.000.patch
>
>
> With 2 Node failures or network partition among multiple servers in ratis, 
> RaftClient retries request and eventually fails with 
> raftRetryFailureException. This exception needs to be handled in OzoneClient 
> in order to handle 2 node failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-865) GrpcXceiverService is added twice to GRPC netty server

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-865:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

Thanks [~xyao] for the contribution. I have committed this change to trunk.

> GrpcXceiverService is added twice to GRPC netty server
> --
>
> Key: HDDS-865
> URL: https://issues.apache.org/jira/browse/HDDS-865
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: HDDS-865.001.patch
>
>
> HDDS-835 added GrpcXceiverService twice. This is found when merge this change 
> with HDDS-4 branch. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-865) GrpcXceiverService is added twice to GRPC netty server

2018-11-21 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695042#comment-16695042
 ] 

Shashikant Banerjee commented on HDDS-865:
--

Thanks [~xyao] for the patch. +1

> GrpcXceiverService is added twice to GRPC netty server
> --
>
> Key: HDDS-865
> URL: https://issues.apache.org/jira/browse/HDDS-865
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Attachments: HDDS-865.001.patch
>
>
> HDDS-835 added GrpcXceiverService twice. This is found when merge this change 
> with HDDS-4 branch. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-866) Handle RaftRetryFailureException in OzoneClient

2018-11-21 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-866:


 Summary: Handle RaftRetryFailureException in OzoneClient
 Key: HDDS-866
 URL: https://issues.apache.org/jira/browse/HDDS-866
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


With 2 Node failures or network partition among multiple servers in ratis, 
RaftClient retries request and eventually fails with raftRetryFailureException. 
This exception needs to be handled in OzoneClient in order to handle 2 node 
failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-284) CRC for ChunksData

2018-11-21 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695031#comment-16695031
 ] 

Shashikant Banerjee commented on HDDS-284:
--

Thanks [~hanishakoneru] for updating the patch. I am +1 on the patch. Let's 
keep it open for sometime in case someone else wants to have a look at it.

> CRC for ChunksData
> --
>
> Key: HDDS-284
> URL: https://issues.apache.org/jira/browse/HDDS-284
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: CRC and Error Detection for Containers.pdf, 
> HDDS-284.00.patch, HDDS-284.005.patch, HDDS-284.006.patch, HDDS-284.01.patch, 
> HDDS-284.02.patch, HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and 
> Error Detection for Containers.pdf
>
>
> This Jira is to add CRC for chunks data.
>  Right now a Chunk Info structure looks like this:
> _message ChunkInfo {_
>   _required string chunkName =_ _1__;_
>   _required uint64 offset =_ _2__;_
>   _required uint64 len =_ _3__;_
>   _optional string checksum =_ _4__;_
>   _repeated KeyValue metadata =_ _5__;_
>  _}_
>  
> Proposal is to change ChunkInfo structure as below: 
> _message ChunkInfo {_
>  _required string chunkName = 1 ;_
>  _required uint64 offset = 2 ;_
>  _required uint64 len = 3 ;_
>  _repeated KeyValue metadata = 4;_
>  _required ChecksumData checksumData = 5;_
> _}_
>  
> The ChecksumData structure would be as follows: 
> _message ChecksumData {_
>  _required ChecksumType type = 1;_ 
>  _required uint32 bytesPerChecksum = 2;_ 
>  _repeated bytes checksums = 3;_
> _}_
>  
> Instead of changing disk format, we put the checksum into chunkInfo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-21 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695009#comment-16695009
 ] 

Shashikant Banerjee commented on HDDS-850:
--

The is dependent on RATIS-410.

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Attachment: HDDS-850.000.patch

> ReadStateMachineData hits OverlappingFileLockException in 
> ContainerStateMachine
> ---
>
> Key: HDDS-850
> URL: https://issues.apache.org/jira/browse/HDDS-850
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-850.000.patch
>
>
> {code:java}
> 2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
> GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
> c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log
> org.apache.ratis.server.storage.RaftLogIOException: 
> 0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
> i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0
>         at 
> org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)
>         at 
> org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)
>         at 
> org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)
>         at 
> org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)
>         at 
> org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.OverlappingFileLockException
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
>         at 
> sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
>         at 
> sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)
>         at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)
>         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)
>         at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         ... 1 more
> {code}
> This happens in the Ratis leader where the stateMachineData is not  in the 
> cached segments in Ratis while it gets a request for ReadStateMachineData 
> while writeStateMachineData is not completed yet. The approach would be to 
> cache the stateMachineData inside ContainerStateMachine and not cache it 
> inside ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-861) SCMNodeManager unit tests are broken

2018-11-21 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-861:
-
Summary: SCMNodeManager unit tests are broken  (was: TestNodeManager unit 
tests are broken)

> SCMNodeManager unit tests are broken
> 
>
> Key: HDDS-861
> URL: https://issues.apache.org/jira/browse/HDDS-861
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
>
> Many of the tests are failing with NullPointerException
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.updateNodeStat(SCMNodeManager.java:195)
> at 
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:276)
> at 
> org.apache.hadoop.hdds.scm.TestUtils.createRandomDatanodeAndRegister(TestUtils.java:147)
> at 
> org.apache.hadoop.hdds.scm.node.TestNodeManager.testScmHeartbeat(TestNodeManager.java:152)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-860) Fix TestDataValidate unit tests

2018-11-20 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-860:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~nandakumar131], for the review. I have committed this change to trunk.

> Fix TestDataValidate unit tests
> ---
>
> Key: HDDS-860
> URL: https://issues.apache.org/jira/browse/HDDS-860
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Tools
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-860.000.patch
>
>
> The RandomKeyGenerator code checks the completed flag inorder to terminate 
> the dataValidation thread. It is not set even after the key processing 
> completes thereby datavalidation thread to run indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-861) TestNodeManager unit tests are broken

2018-11-20 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-861:


 Summary: TestNodeManager unit tests are broken
 Key: HDDS-861
 URL: https://issues.apache.org/jira/browse/HDDS-861
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
 Fix For: 0.4.0


Many of the tests are failing with NullPointerException
{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.updateNodeStat(SCMNodeManager.java:195)
at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:276)
at 
org.apache.hadoop.hdds.scm.TestUtils.createRandomDatanodeAndRegister(TestUtils.java:147)
at 
org.apache.hadoop.hdds.scm.node.TestNodeManager.testScmHeartbeat(TestNodeManager.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-860) Fix TestDataValidate unit tests

2018-11-20 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-860:
-
Status: Patch Available  (was: Open)

> Fix TestDataValidate unit tests
> ---
>
> Key: HDDS-860
> URL: https://issues.apache.org/jira/browse/HDDS-860
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Tools
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-860.000.patch
>
>
> The RandomKeyGenerator code checks the completed flag inorder to terminate 
> the dataValidation thread. It is not set even after the key processing 
> completes thereby datavalidation thread to run indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-860) Fix TestDataValidate unit tests

2018-11-20 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-860:
-
Attachment: HDDS-860.000.patch

> Fix TestDataValidate unit tests
> ---
>
> Key: HDDS-860
> URL: https://issues.apache.org/jira/browse/HDDS-860
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Tools
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-860.000.patch
>
>
> The RandomKeyGenerator code checks the completed flag inorder to terminate 
> the dataValidation thread. It is not set even after the key processing 
> completes thereby datavalidation thread to run indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-860) Fix TestDataValidate unit tests

2018-11-20 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-860:


 Summary: Fix TestDataValidate unit tests
 Key: HDDS-860
 URL: https://issues.apache.org/jira/browse/HDDS-860
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


The RandomKeyGenerator code checks the completed flag inorder to terminate the 
dataValidation thread. It is not set even after the key processing completes 
thereby datavalidation thread to run indefinitely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-20 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-835:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~msingh] for the review. I have committed this change to trunk.

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch, HDDS-835.001.patch
>
>
> As per [~msingh] review comments in HDDS-675 , for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-19 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692675#comment-16692675
 ] 

Shashikant Banerjee commented on HDDS-835:
--

Thanks [~msingh], for the review.

 
{code:java}
ScmConfigKeys:140, lets change OZONE_SCM_CHUNK_MAX_SIZE to 32MB as well{code}
Since, OZONE_SCM_CHUNK_MAX_SIZE is constant, moved it to OzoneConsts.java
{code:java}
TestFailureHandlingByClient:91, the SCM_BLOCK size needs to be set here
{code}
BlockSize is already to set to required value when miniOzoneCluster instance is 
created. No need to set it here.

Rest of the review comments are addressed.

 

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch, HDDS-835.001.patch
>
>
> As per [~msingh] review comments in HDDS-675 , for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-19 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-835:
-
Attachment: HDDS-835.001.patch

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch, HDDS-835.001.patch
>
>
> As per [~msingh] review comments in HDDS-675 , for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-19 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692675#comment-16692675
 ] 

Shashikant Banerjee edited comment on HDDS-835 at 11/20/18 5:02 AM:


Thanks [~msingh], for the review.
{code:java}
ScmConfigKeys:140, lets change OZONE_SCM_CHUNK_MAX_SIZE to 32MB as well{code}
Since, OZONE_SCM_CHUNK_MAX_SIZE is constant, moved it to OzoneConsts.java
{code:java}
TestFailureHandlingByClient:91, the SCM_BLOCK size needs to be set here
{code}
BlockSize is already to set to required value when miniOzoneCluster instance is 
created. No need to set it here.

Rest of the review comments are addressed.

 


was (Author: shashikant):
Thanks [~msingh], for the review.

 
{code:java}
ScmConfigKeys:140, lets change OZONE_SCM_CHUNK_MAX_SIZE to 32MB as well{code}
Since, OZONE_SCM_CHUNK_MAX_SIZE is constant, moved it to OzoneConsts.java
{code:java}
TestFailureHandlingByClient:91, the SCM_BLOCK size needs to be set here
{code}
BlockSize is already to set to required value when miniOzoneCluster instance is 
created. No need to set it here.

Rest of the review comments are addressed.

 

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch, HDDS-835.001.patch
>
>
> As per [~msingh] review comments in HDDS-675 , for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-854) TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky

2018-11-19 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692016#comment-16692016
 ] 

Shashikant Banerjee commented on HDDS-854:
--

[~nandakumar131], I will take care of this.

> TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky
> ---
>
> Key: HDDS-854
> URL: https://issues.apache.org/jira/browse/HDDS-854
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
>
> TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky. It 
> times out while waiting for the mini cluster datanode to restart
> {code}
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:122)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:276)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:283)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures(TestFailureHandlingByClient.java:200)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-854) TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky

2018-11-19 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-854:


Assignee: Shashikant Banerjee  (was: Nanda kumar)

> TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky
> ---
>
> Key: HDDS-854
> URL: https://issues.apache.org/jira/browse/HDDS-854
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Assignee: Shashikant Banerjee
>Priority: Major
>
> TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky. It 
> times out while waiting for the mini cluster datanode to restart
> {code}
>   at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:122)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:276)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:283)
>   at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures(TestFailureHandlingByClient.java:200)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-284) CRC for ChunksData

2018-11-19 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691536#comment-16691536
 ] 

Shashikant Banerjee commented on HDDS-284:
--

Thanks [~hanishakoneru] for updating the patch. The patch looks good to me 
overall. Some minor comments:
 # Checksum#longToBytes  can be replaced with Longs.toByteArray() from 
com.google.common.primitives.Longs package.
 # With the patch it aways seems to be computing the checksum in 
writeChunkToContainerCall. With HTTP headers, if the checksum is already 
available in a Rest call, we might not require to recompute again. Are we going 
to address such cases later?
 # ChunkManagerImpl#writeChunk:-> while handling the overWrites of a chunkFile 
we can just verify the checksum if its already present and return accordingly 
without actually doing I/O ( addressed as TODO in the code). We can also add 
the checksum verification here, though these can be addressed in a separate 
patch as well.
 # ChunkInputStream.java {color:#33}: L213-215 : why is this change 
specifically required? Is it just for making the tests added to work?{color}

> CRC for ChunksData
> --
>
> Key: HDDS-284
> URL: https://issues.apache.org/jira/browse/HDDS-284
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: CRC and Error Detection for Containers.pdf, 
> HDDS-284.00.patch, HDDS-284.005.patch, HDDS-284.01.patch, HDDS-284.02.patch, 
> HDDS-284.03.patch, HDDS-284.04.patch, Interleaving CRC and Error Detection 
> for Containers.pdf
>
>
> This Jira is to add CRC for chunks data.
>  
>  
> Right now a Chunk Info structure looks like this:
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
> _required uint64 offset =_ _2__;_
> _required uint64 len =_ _3__;_
> _optional string checksum =_ _4__;_
> _repeated KeyValue metadata =_ _5__;_
> _}_
>  
> _Proposal is to change ChunkInfo structure as below:_
>  
> _message ChunkInfo {_
>  _required string chunkName =_ _1__;_
>  _required uint64 offset =_ _2__;_
>  _required uint64 len =_ _3__;_
>  _optional bytes checksum =_ _4__;_
>  _optional CRCType checksumType =_ _5__;_
>  _optional string legacyMetadata =_ _6__;_
>  _optional string legacyData =_ _7__;_
>  _repeated KeyValue metadata =_ _8__;_
> _}_
>  
> _Instead of changing disk format, we put the checksum, checksumtype and 
> legacy data fields in to chunkInfo._
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-19 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Description: 
{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)

        at 
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)

        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)

        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        ... 1 more

{code}
This happens in the Ratis leader where the stateMachineData is not  in the 
cached segments in Ratis while it gets a request for ReadStateMachineData while 
writeStateMachineData is not completed yet. The approach would be to cache the 
stateMachineData inside ContainerStateMachine and not cache it inside ratis.

  was:
{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-19 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Description: 
{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)

        at 
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)

        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)

        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        ... 1 more

{code}
This happens in the Ratis leader where the stateMachineData is not  in the 
cached segements in Ratis while it gets a request for ReadStateMachineData 
while writeStateMachineData is not completed yet. The approach would be to 
cache the stateMachineData inside ContainerStateMachine and not cache it inside 
ratis.

  was:
{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at

[jira] [Updated] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-19 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-850:
-
Description: 
{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)

        at 
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)

        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)

        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        ... 1 more

{code}
This happens in the Ratis leader where the stateMachineData is not  cached 
segements in Ratis while it gets a request for ReadStateMachineData while 
writeStateMachineData is not completed yet. The approach would be to cache the 
stateMachineData inside ContainerStateMachine and not cache it inside ratis.

  was:
{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at

[jira] [Created] (HDDS-850) ReadStateMachineData hits OverlappingFileLockException in ContainerStateMachine

2018-11-19 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-850:


 Summary: ReadStateMachineData hits OverlappingFileLockException in 
ContainerStateMachine
 Key: HDDS-850
 URL: https://issues.apache.org/jira/browse/HDDS-850
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


{code:java}
2018-11-16 09:54:41,599 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
c6ad906f-7e71-4bac-bde3-d22bc1aa8c7d) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:1), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=0

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at 
org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:178)

        at 
org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:197)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:542)

        at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:174)

        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:290)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.readStateMachineData(ContainerStateMachine.java:404)

        at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$readStateMachineData$6(ContainerStateMachine.java:462)

        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        ... 1 more

2018-11-16 09:54:41,597 ERROR org.apache.ratis.server.impl.LogAppender: 
GrpcLogAppender(0813f1a9-61be-4cab-aa05-d5640f4a8341 -> 
e3e9a703-55bb-482b-a0a1-ce8000474ac2) hit IOException while loading raft log

org.apache.ratis.server.storage.RaftLogIOException: 
0813f1a9-61be-4cab-aa05-d5640f4a8341: Failed readStateMachineData for (t:2, 
i:2), STATEMACHINELOGENTRY, client-7D19FB803B1E, cid=2

        at 
org.apache.ratis.server.storage.RaftLog$EntryWithData.getEntry(RaftLog.java:370)

        at 
org.apache.ratis.server.impl.LogAppender$LogEntryBuffer.getAppendRequest(LogAppender.java:167)

        at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:216)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:152)

        at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:96)

        at 
org.apache.ratis.server.impl.LogAppender.runAppender(LogAppender.java:100)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.nio.channels.OverlappingFileLockException

        at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)

        at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)

        at 
sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)

        at 
sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)

        at

[jira] [Updated] (HDDS-845) Create a new raftClient instance for every watch request for Ratis

2018-11-19 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-845:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   Status: Resolved  (was: Patch Available)

Thanks [~msingh], [~jnp] for the review. I have committed this change to trunk.

> Create a new raftClient instance for every watch request for Ratis
> --
>
> Key: HDDS-845
> URL: https://issues.apache.org/jira/browse/HDDS-845
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-845.000.patch, HDDS-845.001.patch
>
>
> Currently , watch request go throw sliding window in ratis and hence block as 
> well as get blocked for other requests submitted before . These are read only 
> requests and not necessarily require to go throw the sliding window, Until 
> this gets addressed in Ratis, its better and efficient to create a new raft 
> Client instance for watch request in XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-845) Create a new raftClient instance for every watch request for Ratis

2018-11-16 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-845:
-
Attachment: HDDS-845.001.patch

> Create a new raftClient instance for every watch request for Ratis
> --
>
> Key: HDDS-845
> URL: https://issues.apache.org/jira/browse/HDDS-845
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-845.000.patch, HDDS-845.001.patch
>
>
> Currently , watch request go throw sliding window in ratis and hence block as 
> well as get blocked for other requests submitted before . These are read only 
> requests and not necessarily require to go throw the sliding window, Until 
> this gets addressed in Ratis, its better and efficient to create a new raft 
> Client instance for watch request in XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-845) Create a new raftClient instance for every watch request for Ratis

2018-11-16 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689907#comment-16689907
 ] 

Shashikant Banerjee commented on HDDS-845:
--

Thanks [~jnp] and [~msingh] for the review . Patch v1 addresses your review 
comments.

> Create a new raftClient instance for every watch request for Ratis
> --
>
> Key: HDDS-845
> URL: https://issues.apache.org/jira/browse/HDDS-845
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-845.000.patch, HDDS-845.001.patch
>
>
> Currently , watch request go throw sliding window in ratis and hence block as 
> well as get blocked for other requests submitted before . These are read only 
> requests and not necessarily require to go throw the sliding window, Until 
> this gets addressed in Ratis, its better and efficient to create a new raft 
> Client instance for watch request in XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-16 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689394#comment-16689394
 ] 

Shashikant Banerjee commented on HDDS-801:
--

Thanks [~nandakumar131] for updating the patch. Patch v4 looks good to me. +1

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch, HDDS-801.003.patch, HDDS-801.004.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-845) Create a new raftClient instance for every watch request for Ratis

2018-11-15 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-845:
-
Status: Patch Available  (was: Open)

> Create a new raftClient instance for every watch request for Ratis
> --
>
> Key: HDDS-845
> URL: https://issues.apache.org/jira/browse/HDDS-845
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-845.000.patch
>
>
> Currently , watch request go throw sliding window in ratis and hence block as 
> well as get blocked for other requests submitted before . These are read only 
> requests and not necessarily require to go throw the sliding window, Until 
> this gets addressed in Ratis, its better and efficient to create a new raft 
> Client instance for watch request in XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-845) Create a new raftClient instance for every watch request for Ratis

2018-11-15 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-845:
-
Attachment: HDDS-845.000.patch

> Create a new raftClient instance for every watch request for Ratis
> --
>
> Key: HDDS-845
> URL: https://issues.apache.org/jira/browse/HDDS-845
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-845.000.patch
>
>
> Currently , watch request go throw sliding window in ratis and hence block as 
> well as get blocked for other requests submitted before . These are read only 
> requests and not necessarily require to go throw the sliding window, Until 
> this gets addressed in Ratis, its better and efficient to create a new raft 
> Client instance for watch request in XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-845) Create a new raftClient instance for every watch request for Ratis

2018-11-15 Thread Shashikant Banerjee (JIRA)

Shashikant Banerjee created HDDS-845:


 Summary: Create a new raftClient instance for every watch request 
for Ratis
 Key: HDDS-845
 URL: https://issues.apache.org/jira/browse/HDDS-845
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.4.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.4.0


Currently , watch request go throw sliding window in ratis and hence block as 
well as get blocked for other requests submitted before . These are read only 
requests and not necessarily require to go throw the sliding window, Until this 
gets addressed in Ratis, its better and efficient to create a new raft Client 
instance for watch request in XceiverClientRatis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-774) Remove OpenContainerBlockMap from datanode

2018-11-14 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-774:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

As per offline discussion with [~msingh], committed this change to trunk. 
HDDS-801 will be rebased on top of it.

Thanks [~jnp] and [~msingh] for the reviews.

> Remove OpenContainerBlockMap from datanode
> --
>
> Key: HDDS-774
> URL: https://issues.apache.org/jira/browse/HDDS-774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-774.000.patch, HDDS-774.001.patch
>
>
> With HDDS-675, partial flush of uncommitted keys on Datanodes is not 
> required. OpenContainerBlockMap hence serves no purpose anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686506#comment-16686506
 ] 

Shashikant Banerjee edited comment on HDDS-801 at 11/14/18 2:24 PM:


Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true always.

5. KeyValueContainer.java : 310 ->

The comments look misleading here. The first comment specifies the compaction 
should be done asynchronously as otherwise it will be lot slower . The next 
comment says it is ok if the operation is slow. Can you please check?
{code:java}
@Override
public void close() throws StorageContainerException {

  //TODO: writing .container file and compaction can be done
  // asynchronously, otherwise rpc call for this will take a lot of time to
  // complete this action
  ContainerDataProto.State oldState = null;
  try {
writeLock();
oldState = containerData.getState();
containerData.closeContainer();
File containerFile = getContainerFile();
// update the new container data to .container File
updateContainerFile(containerFile);

  } catch (StorageContainerException ex) {
if (oldState != null) {
  // Failed to update .container file. Reset the state to CLOSING
  containerData.setState(oldState);
}
throw ex;
  } finally {
writeUnlock();
  }
  // It is ok if this operation takes a bit of time.
  // Close container is not expected to be instantaneous.
  compactDB();
}

{code}


was (Author: shashikant):
Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true always.

5.

[jira] [Comment Edited] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686506#comment-16686506
 ] 

Shashikant Banerjee edited comment on HDDS-801 at 11/14/18 2:24 PM:


Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true always.

5. KeyValueContainer.java : 310 ->

The comments look misleading here. The first comment specifies we compaction 
should be done asynchronously as otherwise it will be lot slower . The next 
comment says it is ok if thhe opeartion is slow. Can you please check?
{code:java}
@Override
public void close() throws StorageContainerException {

  //TODO: writing .container file and compaction can be done
  // asynchronously, otherwise rpc call for this will take a lot of time to
  // complete this action
  ContainerDataProto.State oldState = null;
  try {
writeLock();
oldState = containerData.getState();
containerData.closeContainer();
File containerFile = getContainerFile();
// update the new container data to .container File
updateContainerFile(containerFile);

  } catch (StorageContainerException ex) {
if (oldState != null) {
  // Failed to update .container file. Reset the state to CLOSING
  containerData.setState(oldState);
}
throw ex;
  } finally {
writeUnlock();
  }
  // It is ok if this operation takes a bit of time.
  // Close container is not expected to be instantaneous.
  compactDB();
}

{code}


was (Author: shashikant):
Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true always.

 

> Quasi close the

[jira] [Comment Edited] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686506#comment-16686506
 ] 

Shashikant Banerjee edited comment on HDDS-801 at 11/14/18 2:17 PM:


Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true always.

 


was (Author: shashikant):
Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true.

 

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686506#comment-16686506
 ] 

Shashikant Banerjee edited comment on HDDS-801 at 11/14/18 2:12 PM:


Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

4. There can be a case where let's say the SCM gets network separated from a 
follower before sending a closeCommand but Ratis ring is opeartional. In such 
case, leader will execute the closeContainer transaction  successfully and 
follower will try to replicate the same but it will fail as the container was 
never put into closing state in follower before as it was not communicating 
with SCM. The assumption that container will be in closing state before 
closeContainer is called may not be necessarily true.

 


was (Author: shashikant):
Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

 

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686506#comment-16686506
 ] 

Shashikant Banerjee commented on HDDS-801:
--

Thanks [~nandakumar131] for working on this. Some comments 

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

 

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686506#comment-16686506
 ] 

Shashikant Banerjee edited comment on HDDS-801 at 11/14/18 1:42 PM:


Thanks [~nandakumar131] for working on this. In addition to Mukul's comments 
,some more comments :

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

 


was (Author: shashikant):
Thanks [~nandakumar131] for working on this. Some comments 

1.KeyValueHandler.java : 865  -> update the comment to be container getting 
"quasi closed" rather than getting closed.

2.KeyValueHandler.java : 865 -> closeContainer is exposed to clients in 
ContainerProtocolCalls.Java. With SCMCLi as well, the close container can be 
invoked where a client can directly close (closeContainer in 
ContainerOperationClient). In such cases, a container in may be in just open 
state and hence the exception will be thrown:
{code:java}
// The container has to be in CLOSING state.
if (state != State.CLOSING) {
  ContainerProtos.Result error = state == State.INVALID ?
  INVALID_CONTAINER_STATE : CONTAINER_INTERNAL_ERROR;
  throw new StorageContainerException("Cannot close container #" +
  container.getContainerData().getContainerID() + " while in " +
  state + " state.", error);
}{code}
Should we disallow/remove the closeContainer call exposed to clients/SCMCLI?

3. Any state change in ContainerState should triggerICR.In that case, 
closeContainer/quasiCloseContainer call should call updateContainerState 
internally to send ICR instead of executing individually.

 

> Quasi close the container when close is not executed via Ratis
> --
>
> Key: HDDS-801
> URL: https://issues.apache.org/jira/browse/HDDS-801
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-801.000.patch, HDDS-801.001.patch, 
> HDDS-801.002.patch
>
>
> When datanode received CloseContainerCommand and the replication type is not 
> RATIS, we should QUASI close the container. After quasi-closing the container 
> an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-774) Remove OpenContainerBlockMap from datanode

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686484#comment-16686484
 ] 

Shashikant Banerjee edited comment on HDDS-774 at 11/14/18 1:07 PM:


Thanks [~jnp], for the review. I will hold off committing this till HDDS-801 
gets committed as it may create conflicts.


was (Author: shashikant):
Thanks [~shashikant], for the review. I will hold off committing this till 
HDDS-801 gets committed as it may create conflicts.

> Remove OpenContainerBlockMap from datanode
> --
>
> Key: HDDS-774
> URL: https://issues.apache.org/jira/browse/HDDS-774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-774.000.patch, HDDS-774.001.patch
>
>
> With HDDS-675, partial flush of uncommitted keys on Datanodes is not 
> required. OpenContainerBlockMap hence serves no purpose anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-774) Remove OpenContainerBlockMap from datanode

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686484#comment-16686484
 ] 

Shashikant Banerjee commented on HDDS-774:
--

Thanks [~shashikant], for the review. I will hold off committing this till 
HDDS-801 gets committed as it may create conflicts.

> Remove OpenContainerBlockMap from datanode
> --
>
> Key: HDDS-774
> URL: https://issues.apache.org/jira/browse/HDDS-774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-774.000.patch, HDDS-774.001.patch
>
>
> With HDDS-675, partial flush of uncommitted keys on Datanodes is not 
> required. OpenContainerBlockMap hence serves no purpose anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-834) Datanode goes OOM based because of segment size

2018-11-14 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-834:
-
   Resolution: Fixed
Fix Version/s: 0.4.0
   0.3.0
   Status: Resolved  (was: Patch Available)

Thanks [~msingh] for working on this. I have committed this change to trunk as 
well as ozone-0.3.

> Datanode goes OOM based because of segment size
> ---
>
> Key: HDDS-834
> URL: https://issues.apache.org/jira/browse/HDDS-834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-834-ozone-0.3.001.patch, HDDS-834.001.patch
>
>
> Currently ratis segment size is set to 1GB. After RATIS-253, the entry size 
> for a write Chunk is not  counted towards the entry being written to Raft Log.
> This jira controls the segment size to 16KB which makes sure that the number 
> of entries with WriteChunk is limited to 64. This means with 16MB chunk, the 
> total data pending in the segment is 1GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-834) Datanode goes OOM based because of segment size

2018-11-14 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686320#comment-16686320
 ] 

Shashikant Banerjee commented on HDDS-834:
--

Thanks [~msingh] for reporting and working on this. The patch looks good to me. 
I am +1 on the patch with some minor changes:

1) containerCommandCompletionMap renamed to applyTransactionCompletionMap in 
ContainerStateMachine.

2) Adding some more comments while adding dummy entries in 
applyTransactionCompletionMap.

I will take care of these while committing.

> Datanode goes OOM based because of segment size
> ---
>
> Key: HDDS-834
> URL: https://issues.apache.org/jira/browse/HDDS-834
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDDS-834-ozone-0.3.001.patch, HDDS-834.001.patch
>
>
> Currently ratis segment size is set to 1GB. After RATIS-253, the entry size 
> for a write Chunk is not  counted towards the entry being written to Raft Log.
> This jira controls the segment size to 16KB which makes sure that the number 
> of entries with WriteChunk is limited to 64. This means with 16MB chunk, the 
> total data pending in the segment is 1GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-13 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-835:
-
Description: As per [~msingh] review comments in HDDS-675 , for 
streamBufferFlushSize, streamBufferMaxSize, blockSize configs, we should use 
getStorageSize instead of a long value, This Jira aims to address this.  (was: 
As per [~msingh] review comments, for streamBufferFlushSize, 
streamBufferMaxSize, blockSize configs, we should use getStorageSize instead of 
a long value, This Jira aims to address this.)

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch
>
>
> As per [~msingh] review comments in HDDS-675 , for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-13 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685617#comment-16685617
 ] 

Shashikant Banerjee commented on HDDS-835:
--

[~msingh], please have a look.

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch
>
>
> As per [~msingh] review comments, for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-13 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-835:
-
Attachment: HDDS-835.000.patch

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch
>
>
> As per [~msingh] review comments, for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-835) Use storageSize instead of Long for buffer size configs in Ozone Client

2018-11-13 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-835:
-
Status: Patch Available  (was: Open)

> Use storageSize instead of Long for buffer size configs in Ozone Client
> ---
>
> Key: HDDS-835
> URL: https://issues.apache.org/jira/browse/HDDS-835
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-835.000.patch
>
>
> As per [~msingh] review comments, for streamBufferFlushSize, 
> streamBufferMaxSize, blockSize configs, we should use getStorageSize instead 
> of a long value, This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-774) Remove OpenContainerBlockMap from datanode

2018-11-13 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-774:
-
Status: Patch Available  (was: Open)

> Remove OpenContainerBlockMap from datanode
> --
>
> Key: HDDS-774
> URL: https://issues.apache.org/jira/browse/HDDS-774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-774.000.patch, HDDS-774.001.patch
>
>
> With HDDS-675, partial flush of uncommitted keys on Datanodes is not 
> required. OpenContainerBlockMap hence serves no purpose anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-774) Remove OpenContainerBlockMap from datanode

2018-11-13 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-774:
-
Attachment: HDDS-774.001.patch

> Remove OpenContainerBlockMap from datanode
> --
>
> Key: HDDS-774
> URL: https://issues.apache.org/jira/browse/HDDS-774
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-774.000.patch, HDDS-774.001.patch
>
>
> With HDDS-675, partial flush of uncommitted keys on Datanodes is not 
> required. OpenContainerBlockMap hence serves no purpose anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient

2018-11-13 Thread Shashikant Banerjee (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-675:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~jnp] and [~msingh] for the review comments. I have committed this 
change to trunk.

> Add blocking buffer and use watchApi for flush/close in OzoneClient
> ---
>
> Key: HDDS-675
> URL: https://issues.apache.org/jira/browse/HDDS-675
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-675.000.patch, HDDS-675.001.patch, 
> HDDS-675.002.patch, HDDS-675.003.patch, HDDS-675.004.patch, 
> HDDS-675.005.patch, HDDS-675.006.patch
>
>
> For handling 2 node failures, a blocking buffer will be used which will wait 
> for the flush commit index to get updated on all replicas of a container via 
> Ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-675) Add blocking buffer and use watchApi for flush/close in OzoneClient

2018-11-13 Thread Shashikant Banerjee (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685570#comment-16685570
 ] 

Shashikant Banerjee edited comment on HDDS-675 at 11/13/18 6:16 PM:


Thanks [~msingh] for the review.
{code:java}
for streamBufferFlushSize, streamBufferMaxSize, blockSize, lets use the 
getStorageSize in place for getLong, this can be done in a later patch as well.
{code}
Opened HDDS-835 to track the same. Will fix the other review comments and 
checkstyle issues related to unused imports and line length while committing. 
Rest of checkstyle issues related to no of parameters and visibility modifier 
can be neglected.


was (Author: shashikant):
{code:java}
for streamBufferFlushSize, streamBufferMaxSize, blockSize, lets use the 
getStorageSize in place for getLong, this can be done in a later patch as well.
{code}
Opened HDDS-835 to track the same. Will fix the other review comments and 
checkstyle issues related to unused imports and line length while committing. 
Rest of checkstyle issues related to no of parameters and visibility modifier 
can be neglected.

> Add blocking buffer and use watchApi for flush/close in OzoneClient
> ---
>
> Key: HDDS-675
> URL: https://issues.apache.org/jira/browse/HDDS-675
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-675.000.patch, HDDS-675.001.patch, 
> HDDS-675.002.patch, HDDS-675.003.patch, HDDS-675.004.patch, 
> HDDS-675.005.patch, HDDS-675.006.patch
>
>
> For handling 2 node failures, a blocking buffer will be used which will wait 
> for the flush commit index to get updated on all replicas of a container via 
> Ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 2205 matches

Mail list logo