[jira] [Updated] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

2018-07-11 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-199:
--
Status: Patch Available  (was: Open)

> Implement ReplicationManager to replicate ClosedContainers
> --
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch, 
> HDDS-199.006.patch, HDDS-199.007.patch, HDDS-199.008.patch, 
> HDDS-199.009.patch, HDDS-199.010.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

2018-07-11 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541184#comment-16541184
 ] 

Elek, Marton commented on HDDS-199:
---

Sure, patch is rebased. Good the know that trunk is evolving such fast...

> Implement ReplicationManager to replicate ClosedContainers
> --
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch, 
> HDDS-199.006.patch, HDDS-199.007.patch, HDDS-199.008.patch, 
> HDDS-199.009.patch, HDDS-199.010.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

2018-07-11 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-199:
--
Attachment: HDDS-199.010.patch

> Implement ReplicationManager to replicate ClosedContainers
> --
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch, 
> HDDS-199.006.patch, HDDS-199.007.patch, HDDS-199.008.patch, 
> HDDS-199.009.patch, HDDS-199.010.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13697) EDEK decrypt fails due to proxy user being lost because of empty AccessControllerContext

2018-07-11 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539206#comment-16539206
 ] 

Xiao Chen edited comment on HDFS-13697 at 7/12/18 4:12 AM:
---

I have sent out an invite for Thursday (July 12) 10AM PST, let me know the 
preferred time slot if this isn't a good time.

FWIW, the case we want to support here is the client should use the creation 
ugi regardless of the context. The oozie ugi callstack is from a real cluster 
usage out of oozie.


was (Author: xiaochen):
I have sent out an invite for Thursday (July 12) 9AM PST, let me know the 
preferred time slot if this isn't a good time.

FWIW, the case we want to support here is the client should use the creation 
ugi regardless of the context. The oozie ugi callstack is from a real cluster 
usage out of oozie.

> EDEK decrypt fails due to proxy user being lost because of empty 
> AccessControllerContext
> 
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> 

[jira] [Comment Edited] (HDFS-13697) EDEK decrypt fails due to proxy user being lost because of empty AccessControllerContext

2018-07-11 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539206#comment-16539206
 ] 

Xiao Chen edited comment on HDFS-13697 at 7/12/18 4:12 AM:
---

I have sent out an invite for Thursday (July 12) 10AM PST, let me know the 
preferred time slot if this isn't a good time. Ping me if anyone wants to join 
but didn't receive the meeting invite.

FWIW, the case we want to support here is the client should use the creation 
ugi regardless of the context. The oozie ugi callstack is from a real cluster 
usage out of oozie.


was (Author: xiaochen):
I have sent out an invite for Thursday (July 12) 10AM PST, let me know the 
preferred time slot if this isn't a good time.

FWIW, the case we want to support here is the client should use the creation 
ugi regardless of the context. The oozie ugi callstack is from a real cluster 
usage out of oozie.

> EDEK decrypt fails due to proxy user being lost because of empty 
> AccessControllerContext
> 
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> 

[jira] [Updated] (HDFS-12837) Intermittent failure in TestReencryptionWithKMS

2018-07-11 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12837:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.4
   3.1.1
   3.2.0
   Status: Resolved  (was: Patch Available)

> Intermittent failure in TestReencryptionWithKMS
> ---
>
> Key: HDFS-12837
> URL: https://issues.apache.org/jira/browse/HDFS-12837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0-beta1
>Reporter: Surendra Singh Lilhore
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-12837.01.patch, HDFS-12837.02.patch, 
> HDFS-12837.03.patch, hadoop-hdfs.testrun.1.log, hadoop-hdfs.testrun.2.log, 
> hadoop-hdfs.testrun.3.log
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/22112/testReport/org.apache.hadoop.hdfs.server.namenode/TestReencryptionWithKMS/testReencryptionKMSDown/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12837) Intermittent failure in TestReencryptionWithKMS

2018-07-11 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541097#comment-16541097
 ] 

Xiao Chen commented on HDFS-12837:
--

Committed to trunk through branch-3.0.

Thanks Surendra for filing the jira, and Zsolt / Wei-Chiu for the reviews! Also 
special thanks to Zsolt for the verification runs. :)

> Intermittent failure in TestReencryptionWithKMS
> ---
>
> Key: HDFS-12837
> URL: https://issues.apache.org/jira/browse/HDFS-12837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0-beta1
>Reporter: Surendra Singh Lilhore
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-12837.01.patch, HDFS-12837.02.patch, 
> HDFS-12837.03.patch, hadoop-hdfs.testrun.1.log, hadoop-hdfs.testrun.2.log, 
> hadoop-hdfs.testrun.3.log
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/22112/testReport/org.apache.hadoop.hdfs.server.namenode/TestReencryptionWithKMS/testReencryptionKMSDown/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12837) Intermittent failure in TestReencryptionWithKMS

2018-07-11 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541096#comment-16541096
 ] 

Xiao Chen commented on HDFS-12837:
--

Created HDFS-13731 for the ones Zsolt attached. Committing this based on 
Wei-Chiu's +1 (pre-commit failure unrelated, ran the test locally since it's 
been a while)

> Intermittent failure in TestReencryptionWithKMS
> ---
>
> Key: HDFS-12837
> URL: https://issues.apache.org/jira/browse/HDFS-12837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0-beta1
>Reporter: Surendra Singh Lilhore
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-12837.01.patch, HDFS-12837.02.patch, 
> HDFS-12837.03.patch, hadoop-hdfs.testrun.1.log, hadoop-hdfs.testrun.2.log, 
> hadoop-hdfs.testrun.3.log
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/22112/testReport/org.apache.hadoop.hdfs.server.namenode/TestReencryptionWithKMS/testReencryptionKMSDown/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13731) Investigate TestReencryption timeouts

2018-07-11 Thread Xiao Chen (JIRA)
Xiao Chen created HDFS-13731:


 Summary: Investigate TestReencryption timeouts
 Key: HDFS-13731
 URL: https://issues.apache.org/jira/browse/HDFS-13731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, test
Affects Versions: 3.0.0
Reporter: Xiao Chen


HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
[~zvenczel]'s comment, there are a few timeouts still. We should investigate 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12837) Intermittent failure in TestReencryptionWithKMS

2018-07-11 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12837:
-
Summary: Intermittent failure in TestReencryptionWithKMS  (was: 
Intermittent failure TestReencryptionWithKMS#testReencryptionKMSDown)

> Intermittent failure in TestReencryptionWithKMS
> ---
>
> Key: HDFS-12837
> URL: https://issues.apache.org/jira/browse/HDFS-12837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0-beta1
>Reporter: Surendra Singh Lilhore
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-12837.01.patch, HDFS-12837.02.patch, 
> HDFS-12837.03.patch, hadoop-hdfs.testrun.1.log, hadoop-hdfs.testrun.2.log, 
> hadoop-hdfs.testrun.3.log
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/22112/testReport/org.apache.hadoop.hdfs.server.namenode/TestReencryptionWithKMS/testReencryptionKMSDown/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541088#comment-16541088
 ] 

genericqa commented on HDDS-228:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 36m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
50s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 35m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 35m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
29s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 27s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}170m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-228 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931251/HDDS-228.06.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0618d4e20267 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build 

[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541052#comment-16541052
 ] 

genericqa commented on HDDS-228:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
44s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
24s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 32s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.TestStorageContainerManager |
|   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-228 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931249/HDDS-228.05.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  

[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541050#comment-16541050
 ] 

genericqa commented on HDDS-234:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
33s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
12s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
23s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-234 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931256/HDDS-234.02.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 59d3f7e65834 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 632aca5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDDS-Build/501/artifact/out/branch-findbugs-hadoop-hdds_server-scm-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDDS-Build/501/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HDDS-Build/501/artifact/out/patch-asflicense-problems.txt
 |
| Max. process+thread count | 442 (vs. ulimit of 1) |
| modules | C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm |
| Console output | 
https://builds.apache.org/job/PreCommit-HDDS-Build/501/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541013#comment-16541013
 ] 

Anu Engineer commented on HDDS-234:
---

+1, pending Jenkins.

cc:[~nandakumar131]

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch, HDDS-234.02.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541006#comment-16541006
 ] 

Anu Engineer commented on HDDS-187:
---

[~ajayydv] Thanks for taking care of this issue. Some very minor comments on 
the patch v9.
 * CommandStatus.java:32*: rewrite as {{private Status status;}}
 * Is it possible to add cmdID to base SCMCommand Class instead of adding it to 
all derived classes ?
 * Same comment on Handle Call, may be make an abstract class that deal with 
the status and ID ?
 * Do we need _HDDsIDFactory_ at all from quick reading of code it looks like 
all we need is *Time.MonotonicNow()*
 * Why Skip Reregister command in HeartbeatTask.java ? It will be 
 * Should we do the _addCommandStatus_() inside addCommand() ?

 

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch, HDDS-187.04.patch, HDDS-187.05.patch, HDDS-187.06.patch, 
> HDDS-187.07.patch, HDDS-187.08.patch, HDDS-187.09.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540994#comment-16540994
 ] 

Ajay Kumar edited comment on HDDS-234 at 7/12/18 1:36 AM:
--

[~anu] thanks for  review. 
{quote}NodeReportHandler:java:
public NodeReportHandler(SCMNodeManager nm) ==> Spurious edit?{quote}
This was intentional as processNodeReport function was not included in parent 
class. Didn't wanted to touch all other Mock/Test class implementing 
NodeManager but on a second look i think it make sense to move it up in 
NodeManager and provide empty implementation in mock classes. With that done 
current patch doesn't need those changes in constructor and corresponding type 
casting during initialization in StorageContainerManager.
{quote}Also you have lost the class comments for this class.
Line 33: Why lose final?{quote}
Unintended, fixed!!
{quote} It might be a good idea to write a Precondition.CheckNotNull for 
NodeManager.{quote}
Done.
{quote}onMessage()*: Precondition.CheckNotNull(nodeReport); since you are 
accessing members in the call. It is obvious.  It might be a good idea to write 
a Precondition.CheckNotNull for NodeManager.{quote}
Done
{quote}StorageContainerManager.java:156:
You have added a new field, nodeReportHandler, however you seem to assign to a 
local variable in the code.{quote}
Some of these changes were not there in trunk when initial patch was uploaded, 
reverted new changes in current patch.


was (Author: ajayydv):
[~anu] thanks for  review. 
{quote}NodeReportHandler:java:
public NodeReportHandler(SCMNodeManager nm) ==> Spurious edit?{quote}
This was intentional as processNodeReport function was not included in parent 
class. Didn't wanted to touch all other Mock/Test class implementing 
NodeManager but on a second look i think it make sense to move it up in 
NodeManager and provide empty implementation in mock classes. With that done 
current patch doesn't need those changes in constructor and corresponding type 
casting during initialization in StorageContainerManager.
{quote}Also you have lost the class comments for this class.
Line 33: Why lose final?{quote}
Unintended, fixed!!
{quote} It might be a good idea to write a Precondition.CheckNotNull for 
NodeManager.{quote}
Done.
{quote}onMessage()*: Precondition.CheckNotNull(nodeReport); since you are 
accessing members in the call. It is obvious.  It might be a good idea to write 
a Precondition.CheckNotNull for NodeManager.{quote}
Done
{quote}StorageContainerManager.java:156:
You have added a new field, nodeReportHandler, however you seem to assign to a 
local variable in the code.{quote}
Some of these changes were not there in trunk when initial patch was uploaded, 
fixed it in current patch.

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch, HDDS-234.02.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540994#comment-16540994
 ] 

Ajay Kumar commented on HDDS-234:
-

[~anu] thanks for  review. 
{quote}NodeReportHandler:java:
public NodeReportHandler(SCMNodeManager nm) ==> Spurious edit?{quote}
This was intentional as processNodeReport function was not included in parent 
class. Didn't wanted to touch all other Mock/Test class implementing 
NodeManager but on a second look i think it make sense to move it up in 
NodeManager and provide empty implementation in mock classes. With that done 
current patch doesn't need those changes in constructor and corresponding type 
casting during initialization in StorageContainerManager.
{quote}Also you have lost the class comments for this class.
Line 33: Why lose final?{quote}
Unintended, fixed!!
{quote} It might be a good idea to write a Precondition.CheckNotNull for 
NodeManager.{quote}
Done.
{quote}onMessage()*: Precondition.CheckNotNull(nodeReport); since you are 
accessing members in the call. It is obvious.  It might be a good idea to write 
a Precondition.CheckNotNull for NodeManager.{quote}
Done
{quote}StorageContainerManager.java:156:
You have added a new field, nodeReportHandler, however you seem to assign to a 
local variable in the code.{quote}
Some of these changes were not there in trunk when initial patch was uploaded, 
fixed it in current patch.

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch, HDDS-234.02.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-234:

Attachment: HDDS-234.02.patch

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch, HDDS-234.02.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13610) [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove unnecessary dummy sync

2018-07-11 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540964#comment-16540964
 ] 

Konstantin Shvachko edited comment on HDFS-13610 at 7/12/18 1:15 AM:
-

For [~csun]'s comments:
 (1) {{lowestTxnId}} is initialized to {{-1}}, so it seems we need to check if 
it is negative.
 (2) Layout versions are always negative numbers. So using {{0}} instead of 
{{Integer.MAX_VALUE}} as initial value is appropriate and preferable.
 (3) Actually the entire log message in {{updateLayoutVersion()}} is confusing. 
It should state the oldLV, newLV, and newStartTxn. The LVs can jump, so it's 
better to know both.


was (Author: shv):
For [~csun]'s comments:
(1) {{lowestTxnId}} is initialized to {{-1}}, so it seems we need to check if 
it is negative.
(2) Layout versions are always a negative numbers. So using {{0}} instead of 
{{Integer.MAX_VALUE}} as initial value is appropriate and preferrable.
(3) Actually the entire log message in {{updateLayoutVersion()}} is confusing. 
It should state the oldLV, newLV, and newStartTxn. The LVs can jump, so it's 
better to know both.

> [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove 
> unnecessary dummy sync
> --
>
> Key: HDFS-13610
> URL: https://issues.apache.org/jira/browse/HDFS-13610
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, journal-node, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13610-HDFS-12943.000.patch, 
> HDFS-13610-HDFS-12943.001.patch, HDFS-13610-HDFS-12943.002.patch, 
> HDFS-13610-HDFS-12943.003.patch
>
>
> See HDFS-13150 for full design.
> This JIRA is targeted at cleanup tasks:
> * Add in integration testing. We can expand {{TestStandbyInProgressTail}}
> * Documentation in HDFSHighAvailabilityWithQJM
> * Remove the dummy sync added as part of HDFS-10519; it is unnecessary since 
> now in-progress tailing does not rely on the JN committedTxnId
> A few bugs are also fixed:
> * Due to changes in HDFS-13609 to enable use of the RPC mechanism whenever 
> inProgressOK is true, there were codepaths which would use the RPC mechanism 
> even when dfs.ha.tail-edits.in-progress was false, meaning that the JNs did 
> not enable the cache. Update the QJM logic to only use selectRpcInputStreams 
> if this config is true.
> * Fix a false error logged when the layout version changes
> * Fix the logging when a layout version change occurs to avoid printing out a 
> placeholder value (Integer.MAX_VALUE)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13610) [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove unnecessary dummy sync

2018-07-11 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540964#comment-16540964
 ] 

Konstantin Shvachko commented on HDFS-13610:


For [~csun]'s comments:
(1) {{lowestTxnId}} is initialized to {{-1}}, so it seems we need to check if 
it is negative.
(2) Layout versions are always a negative numbers. So using {{0}} instead of 
{{Integer.MAX_VALUE}} as initial value is appropriate and preferrable.
(3) Actually the entire log message in {{updateLayoutVersion()}} is confusing. 
It should state the oldLV, newLV, and newStartTxn. The LVs can jump, so it's 
better to know both.

> [Edit Tail Fast Path Pt 4] Cleanup: integration test, documentation, remove 
> unnecessary dummy sync
> --
>
> Key: HDFS-13610
> URL: https://issues.apache.org/jira/browse/HDFS-13610
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, journal-node, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13610-HDFS-12943.000.patch, 
> HDFS-13610-HDFS-12943.001.patch, HDFS-13610-HDFS-12943.002.patch, 
> HDFS-13610-HDFS-12943.003.patch
>
>
> See HDFS-13150 for full design.
> This JIRA is targeted at cleanup tasks:
> * Add in integration testing. We can expand {{TestStandbyInProgressTail}}
> * Documentation in HDFSHighAvailabilityWithQJM
> * Remove the dummy sync added as part of HDFS-10519; it is unnecessary since 
> now in-progress tailing does not rely on the JN committedTxnId
> A few bugs are also fixed:
> * Due to changes in HDFS-13609 to enable use of the RPC mechanism whenever 
> inProgressOK is true, there were codepaths which would use the RPC mechanism 
> even when dfs.ha.tail-edits.in-progress was false, meaning that the JNs did 
> not enable the cache. Update the QJM logic to only use selectRpcInputStreams 
> if this config is true.
> * Fix a false error logged when the layout version changes
> * Fix the logging when a layout version change occurs to avoid printing out a 
> placeholder value (Integer.MAX_VALUE)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540948#comment-16540948
 ] 

genericqa commented on HDFS-13475:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 
53s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13475 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931240/HDFS-13475.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 723496b2dfe2 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 632aca5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24586/testReport/ |
| Max. process+thread count | 949 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24586/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: 

[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540944#comment-16540944
 ] 

Ajay Kumar commented on HDDS-228:
-

[~anu] as discussed moved the lock in 
{{ContainerStateMap#getContainerReplicas}} before if condition in patch v6. 
Also fixed remaining {{ResultCodes.IO_EXCEPTION}}

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch, HDDS-228.05.patch, HDDS-228.06.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540946#comment-16540946
 ] 

Anu Engineer commented on HDDS-228:
---

Thank you for updating the patch. +1, v6, Pending Jenkins.

 

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch, HDDS-228.05.patch, HDDS-228.06.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-228:

Attachment: HDDS-228.06.patch

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch, HDDS-228.05.patch, HDDS-228.06.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540940#comment-16540940
 ] 

Anu Engineer edited comment on HDDS-234 at 7/12/18 12:44 AM:
-

Thanks for taking care of this issue. Some very minor comments on the patch.

*NodeReportHandler:java*:
 * public NodeReportHandler(SCMNodeManager nm) ==> Spurious edit?
 * Also you have lost the class comments for this class.
 * Line 33: Why lose final? It might be a good idea to write a 
Precondition.CheckNotNull for NodeManager.

 * onMessage()*:
 Precondition.CheckNotNull(nodeReport); since you are accessing members in the 
call. It is obvious. But we have bugs it is easier to debug with asserts, and 
please move both Preconditions to the start of the function.

*StorageContainerManager.java*:156:
 You have added a new field, nodeReportHandler, however you seem to assign to a 
local variable in the code.

*Line 156*:
{code:java}
  /*
  * Report Handlers
  * */
  private NodeReportHandler nodeReportHandler;
{code}
*Line 195*:
{code:java}
NodeReportHandler nodeReportHandler =
new NodeReportHandler((SCMNodeManager) scmNodeManager);
 {code}
The code is not wrong(got lucky on that one :) ), that is because the member 
variable is not really needed once we pass this handler to eventQueue. But the 
code can be confusing to read later, as the line 156 is not used at all.

Also if you revert the change to the Ctor of NodeReportHandler, you can avoid 
the cast in line 195.


was (Author: anu):
Thanks for taking care of this issue. Some very minor comments on the patch.
 
* NodeReportHandler:java*:
 * public NodeReportHandler(SCMNodeManager nm) ==> Spurious edit?
 * Also you have lost the class comments for this class.
 * Line 33: Why lose final? It might be a good idea to write a 
Precondition.CheckNotNull for NodeManager.

* onMessage()*:
 Precondition.CheckNotNull(nodeReport); since you are accessing members in 
the call. It is obvious. But we have bugs it is easier to debug with asserts, 
and please move both Preconditions to the start of the function.

*StorageContainerManager.java*:156:
You have added a new field, nodeReportHandler, however you seem to assign to a 
local variable in the code.

*Line 156*:
{code}
  /*
  * Report Handlers
  * */
  private NodeReportHandler nodeReportHandler;
{code}

*Line 195*:
{code}
NodeReportHandler nodeReportHandler =
new NodeReportHandler((SCMNodeManager) scmNodeManager);
 {code}

 The code is not wrong(got lucky on that one :) ), that is because the member 
variable is not really needed once we pass this handler to eventQueue. But the 
code can be confusing to read later, as the line 156 is not used at all.

 Also if you revert the change to the Ctor of NodeReportHandler, you can avoid 
the cast in line 195.


> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540940#comment-16540940
 ] 

Anu Engineer commented on HDDS-234:
---

Thanks for taking care of this issue. Some very minor comments on the patch.
 
* NodeReportHandler:java*:
 * public NodeReportHandler(SCMNodeManager nm) ==> Spurious edit?
 * Also you have lost the class comments for this class.
 * Line 33: Why lose final? It might be a good idea to write a 
Precondition.CheckNotNull for NodeManager.

* onMessage()*:
 Precondition.CheckNotNull(nodeReport); since you are accessing members in 
the call. It is obvious. But we have bugs it is easier to debug with asserts, 
and please move both Preconditions to the start of the function.

*StorageContainerManager.java*:156:
You have added a new field, nodeReportHandler, however you seem to assign to a 
local variable in the code.

*Line 156*:
{code}
  /*
  * Report Handlers
  * */
  private NodeReportHandler nodeReportHandler;
{code}

*Line 195*:
{code}
NodeReportHandler nodeReportHandler =
new NodeReportHandler((SCMNodeManager) scmNodeManager);
 {code}

 The code is not wrong(got lucky on that one :) ), that is because the member 
variable is not really needed once we pass this handler to eventQueue. But the 
code can be confusing to read later, as the line 156 is not used at all.

 Also if you revert the change to the Ctor of NodeReportHandler, you can avoid 
the cast in line 195.


> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540933#comment-16540933
 ] 

Anu Engineer commented on HDDS-228:
---

+1, pending Jenkins. There is small issue that I can fix while committing.
{code:java}
removeContainerReplica: This was in 2 places -- I will fix this while 
committing.

 throw new SCMException(
    "No entry exist for containerId: " + containerID + " in replica map.",
    ResultCodes.IO_EXCEPTION);{code}
 ==> *FAILED_TO_FIND_CONTAINER*

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch, HDDS-228.05.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540918#comment-16540918
 ] 

Ajay Kumar commented on HDDS-228:
-

[~anu] thanks for review. Addressed all comments in patch v5.

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch, HDDS-228.05.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-228:

Attachment: HDDS-228.05.patch

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch, HDDS-228.05.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540912#comment-16540912
 ] 

genericqa commented on HDDS-187:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
33s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
0s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
25s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 72m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-187 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931234/HDDS-187.09.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  cc  |
| uname | Linux 5d0bca560131 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 

[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540906#comment-16540906
 ] 

genericqa commented on HDDS-234:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-hdds/server-scm in trunk has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
15s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
25s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-234 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931230/HDDS-234.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bd070952ed37 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 632aca5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDDS-Build/498/artifact/out/branch-findbugs-hadoop-hdds_server-scm-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDDS-Build/498/testReport/ |
| asflicense | 
https://builds.apache.org/job/PreCommit-HDDS-Build/498/artifact/out/patch-asflicense-problems.txt
 |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm |
| Console output | 
https://builds.apache.org/job/PreCommit-HDDS-Build/498/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (HDDS-226) Client should update block length in OM while committing the key

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540887#comment-16540887
 ] 

Anu Engineer commented on HDDS-226:
---

Looks good overall. Some very minor comments.

* ChunkGroupOutPutStream.java: Looks like your editor did a replacement, could 
we please revert it? 
{{import org.apache.hadoop.ozone.om.helpers.*;}}

* OmBlockInfo.java==> OzoneBlockInfo.java, it is easier to read Ozone than Om. 
Another orthogonal Question: Sorry for these random comments. I see we have 
BlockID class then we create a new class called OmBlockInfo and add one more 
field, blockLength. Why is this not added as part of BlockID. I am presuming we 
have some strong reason for not adding this field in BlockID.

* updateBlockLength():
I am very confused with this code. Can you please check? Why are checking 
keyArgs, did you intend to check blockInfoList?
{code}
if (keyArgs != null) {
  OmBlockInfo blockInfo = blockInfoList.get(index);
  long originalLength = blockInfo.getBlockLength();
  blockInfo.setBlockLength(originalLength + length);
}
  }
{code}

* Nit: updateBlockLength --> rename to incrementBlockLength?
* OmBlockInfo.java: Nit: Unused import.
* OmKeysArgs.java: More of a question, when would this sum be not equal to the 
datasize ? 
{code}
  public long getDataSize() {
if (blockInfoList == null) {
  return dataSize;
} else {
  return blockInfoList.parallelStream().mapToLong(e -> e.getBlockLength())
  .sum();
}
}
{code}

* nit:validateBlockLengthWithCommitKey -> testvalidateBlockLengthWithCommitKey

* in the test:
String value = "sample value";  Replace with 
String value = RandomStringUtils.random(RandomUtils.nextInt(0,1024));

> Client should update block length in OM while committing the key
> 
>
> Key: HDDS-226
> URL: https://issues.apache.org/jira/browse/HDDS-226
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-226.00.patch, HDDS-226.01.patch
>
>
> Currently the client allocate a key of size with SCM block size, however a 
> client can always write smaller amount of data and close the key. The block 
> length in this case should be updated on OM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-07-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540884#comment-16540884
 ] 

Chao Sun commented on HDFS-13475:
-

Uploaded patch v2.

> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: https://issues.apache.org/jira/browse/HDFS-13475
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13475.000.patch, HDFS-13475.001.patch, 
> HDFS-13475.002.patch
>
>
> To reproduce the issue: 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode enter
> Successfully enter safe mode.
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: true{code}
> And then, 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: false{code}
> From the code, it looks like the periodicInvoke triggers the leave.
> {code:java}
> public void periodicInvoke() {
> ..
>   // Always update to indicate our cache was updated
>   if (isCacheStale) {
> if (!rpcServer.isInSafeMode()) {
>   enter();
> }
>   } else if (rpcServer.isInSafeMode()) {
> // Cache recently updated, leave safe mode
> leave();
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-07-11 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13475:

Attachment: HDFS-13475.002.patch

> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: https://issues.apache.org/jira/browse/HDFS-13475
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13475.000.patch, HDFS-13475.001.patch, 
> HDFS-13475.002.patch
>
>
> To reproduce the issue: 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode enter
> Successfully enter safe mode.
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: true{code}
> And then, 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: false{code}
> From the code, it looks like the periodicInvoke triggers the leave.
> {code:java}
> public void periodicInvoke() {
> ..
>   // Always update to indicate our cache was updated
>   if (isCacheStale) {
> if (!rpcServer.isInSafeMode()) {
>   enter();
> }
>   } else if (rpcServer.isInSafeMode()) {
> // Cache recently updated, leave safe mode
> leave();
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13665) Move RPC response serialization into Server.doResponse

2018-07-11 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-13665:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

I just committed this to the feature branch. Thank you [~zero45].

> Move RPC response serialization into Server.doResponse
> --
>
> Key: HDFS-13665
> URL: https://issues.apache.org/jira/browse/HDFS-13665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13665-HDFS-12943.000.patch, 
> HDFS-13665-HDFS-12943.001.patch, HDFS-13665-HDFS-12943.002.patch
>
>
> In HDFS-13399 we addressed a race condition in AlignmentContext processing 
> where the RPC response would assign a transactionId independently of the 
> transactions own processing, resulting in a stateId response that was lower 
> than expected. However this caused us to serialize the RpcResponse twice in 
> order to address the header field change.
> See here:
> https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16464279=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16464279
> And here:
> https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16498660=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16498660
> At the end it was agreed upon to move the logic of Server.setupResponse into 
> Server.doResponse directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12976) Introduce ObserverReadProxyProvider

2018-07-11 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-12976:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

I just committed this to the feature branch. Thank you [~csun].

> Introduce ObserverReadProxyProvider
> ---
>
> Key: HDFS-12976
> URL: https://issues.apache.org/jira/browse/HDFS-12976
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-12976-HDFS-12943.000.patch, 
> HDFS-12976-HDFS-12943.001.patch, HDFS-12976-HDFS-12943.002.patch, 
> HDFS-12976-HDFS-12943.003.patch, HDFS-12976-HDFS-12943.004.patch, 
> HDFS-12976-HDFS-12943.005.patch, HDFS-12976-HDFS-12943.006.patch, 
> HDFS-12976-HDFS-12943.007.patch, HDFS-12976-HDFS-12943.008.patch, 
> HDFS-12976.WIP.patch
>
>
> {{StandbyReadProxyProvider}} should implement {{FailoverProxyProvider}} 
> interface and be able to submit read requests to ANN and SBN(s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13665) Move RPC response serialization into Server.doResponse

2018-07-11 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540836#comment-16540836
 ] 

Konstantin Shvachko commented on HDFS-13665:


Looks good. Will commit shortly.

> Move RPC response serialization into Server.doResponse
> --
>
> Key: HDFS-13665
> URL: https://issues.apache.org/jira/browse/HDFS-13665
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Major
> Attachments: HDFS-13665-HDFS-12943.000.patch, 
> HDFS-13665-HDFS-12943.001.patch, HDFS-13665-HDFS-12943.002.patch
>
>
> In HDFS-13399 we addressed a race condition in AlignmentContext processing 
> where the RPC response would assign a transactionId independently of the 
> transactions own processing, resulting in a stateId response that was lower 
> than expected. However this caused us to serialize the RpcResponse twice in 
> order to address the header field change.
> See here:
> https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16464279=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16464279
> And here:
> https://issues.apache.org/jira/browse/HDFS-13399?focusedCommentId=16498660=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16498660
> At the end it was agreed upon to move the logic of Server.setupResponse into 
> Server.doResponse directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13310) [PROVIDED Phase 2] The DatanodeProtocol should be have DNA_BACKUP to backup blocks

2018-07-11 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-13310:
--
Status: Patch Available  (was: Open)

> [PROVIDED Phase 2] The DatanodeProtocol should be have DNA_BACKUP to backup 
> blocks
> --
>
> Key: HDFS-13310
> URL: https://issues.apache.org/jira/browse/HDFS-13310
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Major
> Attachments: HDFS-13310-HDFS-12090.001.patch, 
> HDFS-13310-HDFS-12090.002.patch, HDFS-13310-HDFS-12090.003.patch, 
> HDFS-13310-HDFS-12090.004.patch, HDFS-13310-HDFS-12090.005.patch, 
> HDFS-13310-HDFS-12090.006.patch, HDFS-13310-HDFS-12090.007.patch
>
>
> As part of HDFS-12090, Datanodes should be able to receive DatanodeCommands 
> in the heartbeat response that instructs it to backup a block.
> This should take the form of two sub commands: PUT_FILE (when the file is <=1 
> block in size) and MULTIPART_PUT_PART when part of a Multipart Upload (see 
> HDFS-13186).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13310) [PROVIDED Phase 2] The DatanodeProtocol should be have DNA_BACKUP to backup blocks

2018-07-11 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-13310:
--
Status: Open  (was: Patch Available)

> [PROVIDED Phase 2] The DatanodeProtocol should be have DNA_BACKUP to backup 
> blocks
> --
>
> Key: HDFS-13310
> URL: https://issues.apache.org/jira/browse/HDFS-13310
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Major
> Attachments: HDFS-13310-HDFS-12090.001.patch, 
> HDFS-13310-HDFS-12090.002.patch, HDFS-13310-HDFS-12090.003.patch, 
> HDFS-13310-HDFS-12090.004.patch, HDFS-13310-HDFS-12090.005.patch, 
> HDFS-13310-HDFS-12090.006.patch, HDFS-13310-HDFS-12090.007.patch
>
>
> As part of HDFS-12090, Datanodes should be able to receive DatanodeCommands 
> in the heartbeat response that instructs it to backup a block.
> This should take the form of two sub commands: PUT_FILE (when the file is <=1 
> block in size) and MULTIPART_PUT_PART when part of a Multipart Upload (see 
> HDFS-13186).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDDS-234:
--
Status: Patch Available  (was: Open)

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDDS-187:
--
Status: Patch Available  (was: Open)

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch, HDDS-187.04.patch, HDDS-187.05.patch, HDDS-187.06.patch, 
> HDDS-187.07.patch, HDDS-187.08.patch, HDDS-187.09.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-187:

Attachment: HDDS-187.09.patch

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch, HDDS-187.04.patch, HDDS-187.05.patch, HDDS-187.06.patch, 
> HDDS-187.07.patch, HDDS-187.08.patch, HDDS-187.09.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12837) Intermittent failure TestReencryptionWithKMS#testReencryptionKMSDown

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540781#comment-16540781
 ] 

Wei-Chiu Chuang commented on HDFS-12837:


I looked at this failed test before realizing it's being fixed here.

I would prefer using AtomicInteger for decrementing the counter, instead of 
using synchronized method. But since it's in the test code, I am okay with that.

 

+1 for the 03 patch, and let's work on a follow up fix. I want to get this fix 
in soon to reduce the flakiness rate.

> Intermittent failure TestReencryptionWithKMS#testReencryptionKMSDown
> 
>
> Key: HDFS-12837
> URL: https://issues.apache.org/jira/browse/HDFS-12837
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0-beta1
>Reporter: Surendra Singh Lilhore
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-12837.01.patch, HDFS-12837.02.patch, 
> HDFS-12837.03.patch, hadoop-hdfs.testrun.1.log, hadoop-hdfs.testrun.2.log, 
> hadoop-hdfs.testrun.3.log
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/22112/testReport/org.apache.hadoop.hdfs.server.namenode/TestReencryptionWithKMS/testReencryptionKMSDown/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540783#comment-16540783
 ] 

Ajay Kumar commented on HDDS-187:
-

[~anu] thanks for catching that. Patch v9 rebased with trunk.

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch, HDDS-187.04.patch, HDDS-187.05.patch, HDDS-187.06.patch, 
> HDDS-187.07.patch, HDDS-187.08.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540761#comment-16540761
 ] 

Ajay Kumar commented on HDDS-234:
-

patch v1 rebased with trunk.

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-234:

Attachment: HDDS-234.01.patch

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch, HDDS-234.01.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-07-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540737#comment-16540737
 ] 

Íñigo Goiri commented on HDFS-13475:


That sounds good.

> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: https://issues.apache.org/jira/browse/HDFS-13475
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13475.000.patch, HDFS-13475.001.patch
>
>
> To reproduce the issue: 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode enter
> Successfully enter safe mode.
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: true{code}
> And then, 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: false{code}
> From the code, it looks like the periodicInvoke triggers the leave.
> {code:java}
> public void periodicInvoke() {
> ..
>   // Always update to indicate our cache was updated
>   if (isCacheStale) {
> if (!rpcServer.isInSafeMode()) {
>   enter();
> }
>   } else if (rpcServer.isInSafeMode()) {
> // Cache recently updated, leave safe mode
> leave();
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540735#comment-16540735
 ] 

Anu Engineer commented on HDDS-199:
---

[~elek] Thanks for the patch. Unfortunately this patch is not applying on the 
current trunk. Can you please rebase this patch? Thanks in advance.

> Implement ReplicationManager to replicate ClosedContainers
> --
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch, 
> HDDS-199.006.patch, HDDS-199.007.patch, HDDS-199.008.patch, HDDS-199.009.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-199) Implement ReplicationManager to replicate ClosedContainers

2018-07-11 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDDS-199:
--
Status: Open  (was: Patch Available)

> Implement ReplicationManager to replicate ClosedContainers
> --
>
> Key: HDDS-199
> URL: https://issues.apache.org/jira/browse/HDDS-199
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-199.001.patch, HDDS-199.002.patch, 
> HDDS-199.003.patch, HDDS-199.004.patch, HDDS-199.005.patch, 
> HDDS-199.006.patch, HDDS-199.007.patch, HDDS-199.008.patch, HDDS-199.009.patch
>
>
> HDDS/Ozone supports Open and Closed containers. In case of specific 
> conditions (container is full, node is failed) the container will be closed 
> and will be replicated in a different way. The replication of Open containers 
> are handled with Ratis and PipelineManger.
> The ReplicationManager should handle the replication of the ClosedContainers. 
> The replication information will be sent as an event 
> (UnderReplicated/OverReplicated). 
> The Replication manager will collect all of the events in a priority queue 
> (to replicate first the containers where more replica is missing) calculate 
> the destination datanode (first with a very simple algorithm, later with 
> calculating scatter-width) and send the Copy/Delete container to the datanode 
> (CommandQueue).
> A CopyCommandWatcher/DeleteCommandWatcher are also included to retry the 
> copy/delete in case of failure. This is an in-memory structure (based on 
> HDDS-195) which can requeue the underreplicated/overreplicated events to the 
> prioirity queue unless the confirmation of the copy/delete command is arrived.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540731#comment-16540731
 ] 

Anu Engineer commented on HDDS-187:
---

[~ajayydv] Sorry the patch is not apply on the Trunk. Can you please rebase 
this patch. Thanks in advance.

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch, HDDS-187.04.patch, HDDS-187.05.patch, HDDS-187.06.patch, 
> HDDS-187.07.patch, HDDS-187.08.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-187) Command status publisher for datanode

2018-07-11 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDDS-187:
--
Status: Open  (was: Patch Available)

> Command status publisher for datanode
> -
>
> Key: HDDS-187
> URL: https://issues.apache.org/jira/browse/HDDS-187
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-187.00.patch, HDDS-187.01.patch, HDDS-187.02.patch, 
> HDDS-187.03.patch, HDDS-187.04.patch, HDDS-187.05.patch, HDDS-187.06.patch, 
> HDDS-187.07.patch, HDDS-187.08.patch
>
>
> Currently SCM sends set of commands for DataNode. DataNode executes them via 
> CommandHandler. This jira intends to create a Command status publisher which 
> will return status of these commands back to the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540728#comment-16540728
 ] 

genericqa commented on HDFS-13448:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
16s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
46s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 38s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}243m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13448 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931168/HDFS-13448.14.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  

[jira] [Commented] (HDDS-228) Add the ReplicaMaps to ContainerStateManager

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540725#comment-16540725
 ] 

Anu Engineer commented on HDDS-228:
---

Thank you for the patch. It looks good overall. Some very minor comments.
 * *ContainerStateMap.java*:
{code:java}
   throw new SCMException(
"No entry exist for containerId: " + containerID + " in replica map.",
ResultCodes.IO_EXCEPTION);
{code}
Replace ResultCodes.IO_EXCEPTION ---> FAILED_TO_FIND_CONTAINER

 * *getContainerReplicas*():
{code:java}
if (contReplicaMap.containsKey(containerID)) {
  return Collections
  .unmodifiableSet(contReplicaMap.get(containerID));
}
{code}
I think you need to lock here too, since you are locking in the add and remove.

 * *addContainerReplica*():
 Please correct me if I am wrong, We seem to be locking and releasing too many 
times:
{code:java}
 for (DatanodeDetails dn : dnList) {
  Preconditions.checkNotNull(dn);

  // Take lock to avoid race condition around insertion.
  try (AutoCloseableLock lock = autoLock.acquire()) {
if (contReplicaMap.containsKey(containerID)) {
  contReplicaMap.get(containerID).add(dn);
} else {
  Set dnSet = new HashSet<>();
  dnSet.add(dn);
  contReplicaMap.put(containerID, dnSet);
}
  }
}
{code}
For each DN we are locking and releasing. Assuming that we are only going to 
add handleful of replicas at the max (say 3 or less than 5), it might be 
cheaper to do the locking before for loop.

 * There is probably a latent bug in the 
{{contReplicaMap.get(containerID).add(dn);}}, it is possible the user might add 
a DNDetail which already part of this set. That error code – which is false – 
is ignored instead of propagating it back to user. if we don't want to 
propagate it back, we should at least log that info.

 * *ContainerStateMap.java:364*: Spurious Edit?

 * *testReplicaMap*():
 It might be a good idea to re-insert a node and add a test case, as well as 
define that behavior.

> Add the ReplicaMaps to ContainerStateManager
> 
>
> Key: HDDS-228
> URL: https://issues.apache.org/jira/browse/HDDS-228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Anu Engineer
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-228.00.patch, HDDS-228.01.patch, HDDS-228.02.patch, 
> HDDS-228.03.patch, HDDS-228.04.patch
>
>
> We need to maintain a list of data nodes in the SCM that tells us where a 
> container is located. This created from the container reports.  The HDDS-175 
> refactored the class to make this separation easy and this JIRA is a followup 
> that keeps a hash table to track this information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13728) Disk Balancer should not fail if volume usage is greater than capacity

2018-07-11 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-13728:
-
Summary: Disk Balancer should not fail if volume usage is greater than 
capacity  (was: Disk Balaner should not fail if volume usage is greater than 
capacity)

> Disk Balancer should not fail if volume usage is greater than capacity
> --
>
> Key: HDFS-13728
> URL: https://issues.apache.org/jira/browse/HDFS-13728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Assignee: Gabor Bota
>Priority: Minor
>
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>public void setUsed(long dfsUsedSpace) {
> -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -this.used = dfsUsedSpace;
> +if (dfsUsedSpace > this.getCapacity()) {
> +  this.used = this.getCapacity();
> +} else {
> +  this.used = dfsUsedSpace;
> +}
>}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540714#comment-16540714
 ] 

Anu Engineer commented on HDFS-13728:
-

Yes, adding a test makes sense. Since you have done this much work, you 
probably have a patch ready. Please feel free to post it. I would be happy to 
review it.

> Disk Balaner should not fail if volume usage is greater than capacity
> -
>
> Key: HDFS-13728
> URL: https://issues.apache.org/jira/browse/HDFS-13728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Assignee: Gabor Bota
>Priority: Minor
>
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>public void setUsed(long dfsUsedSpace) {
> -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -this.used = dfsUsedSpace;
> +if (dfsUsedSpace > this.getCapacity()) {
> +  this.used = this.getCapacity();
> +} else {
> +  this.used = dfsUsedSpace;
> +}
>}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Stephen O'Donnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540709#comment-16540709
 ] 

Stephen O'Donnell commented on HDFS-13728:
--

I see Gabor has grabbed this one, which is OK with me. It's probably worth 
adding a simple test for this change too. I think we could add one to 
org.apache.hadoop.hdfs.server.diskbalancer.TestDataModels ?

> Disk Balaner should not fail if volume usage is greater than capacity
> -
>
> Key: HDFS-13728
> URL: https://issues.apache.org/jira/browse/HDFS-13728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Assignee: Gabor Bota
>Priority: Minor
>
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>public void setUsed(long dfsUsedSpace) {
> -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -this.used = dfsUsedSpace;
> +if (dfsUsedSpace > this.getCapacity()) {
> +  this.used = this.getCapacity();
> +} else {
> +  this.used = dfsUsedSpace;
> +}
>}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandled exception

2018-07-11 Thread Stephen O'Donnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-13727:
-
Summary: Log full stack trace if DiskBalancer exits with an unhandled 
exception  (was: Log full stack trace if DiskBalancer exits with an unhandle 
exceptiopn)

> Log full stack trace if DiskBalancer exits with an unhandled exception
> --
>
> Key: HDFS-13727
> URL: https://issues.apache.org/jira/browse/HDFS-13727
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Assignee: Gabor Bota
>Priority: Minor
>
> In HDFS-13175 it was discovered that when a DN reports the usage on a volume 
> to be greater than the volume capacity, the disk balancer will fail with an 
> unhelpful error:
> {code}
> $ hdfs diskbalancer -report -top 5
> 18/06/11 10:19:43 INFO command.Command: Processing report command
> 18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
> 18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: 
> java.lang.IllegalArgumentException
> {code}
> In HDFS-13175, a change was made to include more details in the exception 
> name,  so after the change the code is:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> There may however be other scenarios that cause the balancer to exit with an 
> unhandled exception, and it would be helpful if the tool logged out the full 
> stack trace on error rather than just the exception name.
> In DiskBalancerCLI.java, the relevant code is:
> {code}
>   public static void main(String[] argv) throws Exception {
> DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
> int res = 0;
> try {
>   res = ToolRunner.run(shell, argv);
> } catch (Exception ex) {
>   LOG.error(ex.toString());
>   res = 1;
> }
> System.exit(res);
>   }
> {code}
> We should change the error logged in the exception block to log out the full 
> stack to give more information on all unhandled errors, eg:
> {code}
> LOG.error(ex.toString(), ex);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-07-11 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540682#comment-16540682
 ] 

Chao Sun commented on HDFS-13475:
-

Thanks [~elgoiri]. I think we still need two flags still: one {{safeMode}} and 
another {{isSafeModeSetManually}}, but we can set them together in 
{{setManualSafeMode(true)}} and remove {{setSafeMode()}}.

One question related to the test failures though: after we move the logic to 
{{RouterSafemodeService}}, enter/leave/get safe mode will no longer be 
effective *if* the safe mode service is disabled. We also need to check several 
places on the result of {{getSafemodeService}} before we call any method on the 
service. Let me know if you are OK with this.

> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: https://issues.apache.org/jira/browse/HDFS-13475
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13475.000.patch, HDFS-13475.001.patch
>
>
> To reproduce the issue: 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode enter
> Successfully enter safe mode.
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: true{code}
> And then, 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: false{code}
> From the code, it looks like the periodicInvoke triggers the leave.
> {code:java}
> public void periodicInvoke() {
> ..
>   // Always update to indicate our cache was updated
>   if (isCacheStale) {
> if (!rpcServer.isInSafeMode()) {
>   enter();
> }
>   } else if (rpcServer.isInSafeMode()) {
> // Cache recently updated, leave safe mode
> leave();
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540655#comment-16540655
 ] 

Anu Engineer commented on HDDS-234:
---

[~ajayydv] Can you please rebase this patch. This does not apply any more.

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-234) Add SCM node report handler

2018-07-11 Thread Anu Engineer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDDS-234:
--
Status: Open  (was: Patch Available)

> Add SCM node report handler
> ---
>
> Key: HDDS-234
> URL: https://issues.apache.org/jira/browse/HDDS-234
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-234.00.patch
>
>
> This ticket is opened to add SCM nodereport handler after the refactoring. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reassigned HDFS-13728:
-

Assignee: Gabor Bota

> Disk Balaner should not fail if volume usage is greater than capacity
> -
>
> Key: HDFS-13728
> URL: https://issues.apache.org/jira/browse/HDFS-13728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Assignee: Gabor Bota
>Priority: Minor
>
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>public void setUsed(long dfsUsedSpace) {
> -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -this.used = dfsUsedSpace;
> +if (dfsUsedSpace > this.getCapacity()) {
> +  this.used = this.getCapacity();
> +} else {
> +  this.used = dfsUsedSpace;
> +}
>}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-251) Integrate BlockDeletingService in KeyValueHandler

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540645#comment-16540645
 ] 

genericqa commented on HDDS-251:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
2s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
53s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
18s{color} | {color:red} integration-test in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 45m 
35s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 45m 35s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  2m  
1s{color} | {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
50s{color} | {color:green} common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m  5s{color} 
| {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 57s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
59s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ozone.container.common.TestDatanodeStateMachine |
|   | hadoop.ozone.container.common.report.TestReportPublisher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-251 |
| JIRA Patch URL | 

[jira] [Commented] (HDDS-222) Remove hdfs command line from ozone distrubution.

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540644#comment-16540644
 ] 

Anu Engineer commented on HDDS-222:
---

Even after the revert, I am running into the same issue after this patch is 
applied.

Here is the command that I am using to build.

_mvn package -Pdist -Phdds -DskipTests -Dtar -DskipShade 
-Dmaven.javadoc.skip=true_

Here is the error message:
{noformat}
Acceptance.Basic.Ozone-Shell :: Test ozone shell CLI usage| FAIL |
Suite setup failed:
Keyword 'Is Daemon started' failed after retrying for 1 minute. The last error 
was: 'Attaching to basic_ozoneManager_1, basic_datanode_1, basic_scm_1
ozoneManager_1  | Waiting 15 seconds for SCM startup
ozoneManager_1  | Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/hdfs/DFSUtil
ozoneManager_1  |   at 
org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:303)
ozoneManager_1  | Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hdfs.DFSUtil
ozoneManager_1  |   at 
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
ozoneManager_1  |   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
ozoneManager_1  |   at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
ozoneManager_1  |   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
ozoneManager_1  |   ... 1 more
scm_1   | Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/hdfs/DFSUtil
[ Message content over the limit has been removed. ]
datanode_1  | 2018-07-11 20:46:28 ERROR HddsDatanodeService:249 - Exception 
in HddsDatanodeService.
datanode_1  | java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/DFSUtil
datanode_1  |   at 
org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:234)
datanode_1  | Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hdfs.DFSUtil
datanode_1  |   at 
java.net.URLClassLoader.findClass(URLClassLoader.java:381)
datanode_1  |   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
datanode_1  |   at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
datanode_1  |   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
datanode_1  |   ... 1 more
datanode_1  | 2018-07-11 20:46:28 INFO  ExitUtil:210 - Exiting with status 
1: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/DFSUtil' does not 
contain 'HTTP server of OZONEMANAGER is listening'

{noformat}

 

 

> Remove hdfs command line from ozone distrubution.
> -
>
> Key: HDDS-222
> URL: https://issues.apache.org/jira/browse/HDDS-222
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-222.001.patch
>
>
> Az the ozone release artifact doesn't contain a stable namenode/datanode code 
> the hdfs command should be removed from the ozone artifact.
> ozone-dist-layout-stitching also could be simplified to copy only the 
> required jar files (we don't need to copy the namenode/datanode server side 
> jars, just the common artifacts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandle exceptiopn

2018-07-11 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reassigned HDFS-13727:
-

Assignee: Gabor Bota

> Log full stack trace if DiskBalancer exits with an unhandle exceptiopn
> --
>
> Key: HDFS-13727
> URL: https://issues.apache.org/jira/browse/HDFS-13727
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Assignee: Gabor Bota
>Priority: Minor
>
> In HDFS-13175 it was discovered that when a DN reports the usage on a volume 
> to be greater than the volume capacity, the disk balancer will fail with an 
> unhelpful error:
> {code}
> $ hdfs diskbalancer -report -top 5
> 18/06/11 10:19:43 INFO command.Command: Processing report command
> 18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from 
> NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
> 18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
> 18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 
> 30mins, 0sec
> 18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: 
> java.lang.IllegalArgumentException
> {code}
> In HDFS-13175, a change was made to include more details in the exception 
> name,  so after the change the code is:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> There may however be other scenarios that cause the balancer to exit with an 
> unhandled exception, and it would be helpful if the tool logged out the full 
> stack trace on error rather than just the exception name.
> In DiskBalancerCLI.java, the relevant code is:
> {code}
>   public static void main(String[] argv) throws Exception {
> DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
> int res = 0;
> try {
>   res = ToolRunner.run(shell, argv);
> } catch (Exception ex) {
>   LOG.error(ex.toString());
>   res = 1;
> }
> System.exit(res);
>   }
> {code}
> We should change the error logged in the exception block to log out the full 
> stack to give more information on all unhandled errors, eg:
> {code}
> LOG.error(ex.toString(), ex);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13730) BlockReaderRemote.sendReadResult throws NPE

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13730:
---
Environment: 
Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403.

(hbase-site.xml) hbase.rs.prefetchblocksonopen=true

  was:Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403.


> BlockReaderRemote.sendReadResult throws NPE
> ---
>
> Key: HDFS-13730
> URL: https://issues.apache.org/jira/browse/HDFS-13730
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
> Environment: Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403.
> (hbase-site.xml) hbase.rs.prefetchblocksonopen=true
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> Found the following exception thrown in a HBase RegionServer log (Hadoop 
> 3.0.0 + HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this 
> cluster, but I am not sure if that's related at all):
> {noformat}
> 2018-07-11 11:10:44,462 WARN 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: Stream moved/closed or 
> prefetch 
> cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5,
>  offset=216505135,
> end=2309991906
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99)
> at 
> org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992)
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1348)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1312)
> at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:331)
> at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1565)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){noformat}
> The relevant Hadoop code:
> {code:java|title=BlockReaderRemote#sendReadResult}
> void sendReadResult(Status statusCode) {
>   assert !sentStatusCode : "already sent status code to " + peer;
>   try {
> writeReadResult(peer.getOutputStream(), statusCode);
> sentStatusCode = true;
>   } catch (IOException e) {
> // It's ok not to be able to send this. But something is probably wrong.
> LOG.info("Could not send read status (" + statusCode + ") to datanode " +
> peer.getRemoteAddressString() + ": " + e.getMessage());
>   }
> }
> {code}
> So the NPE was thrown within a exception handler. A possible explanation 
> could be that the socket was closed so client couldn't write, and 
> Socket#getRemoteSocketAddress() returns null when the socket is closed.
> Suggest check for nullity and return an empty string in 
> NioInetPeer.getRemoteAddressString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13730) BlockReaderRemote.sendReadResult throws NPE

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13730:
---
Description: 
Found the following exception thrown in a HBase RegionServer log (Hadoop 3.0.0 
+ HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this cluster, 
but I am not sure if that's related at all):
{noformat}
2018-07-11 11:10:44,462 WARN org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: 
Stream moved/closed or prefetch 
cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5,
 offset=216505135,
end=2309991906
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99)
at 
org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1348)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1312)
at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:331)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1565)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){noformat}
The relevant Hadoop code:
{code:java|title=BlockReaderRemote#sendReadResult}
void sendReadResult(Status statusCode) {
  assert !sentStatusCode : "already sent status code to " + peer;
  try {
writeReadResult(peer.getOutputStream(), statusCode);
sentStatusCode = true;
  } catch (IOException e) {
// It's ok not to be able to send this. But something is probably wrong.
LOG.info("Could not send read status (" + statusCode + ") to datanode " +
peer.getRemoteAddressString() + ": " + e.getMessage());
  }
}
{code}
So the NPE was thrown within a exception handler. A possible explanation could 
be that the socket was closed so client couldn't write, and 
Socket#getRemoteSocketAddress() returns null when the socket is closed.

Suggest check for nullity and return an empty string in 
NioInetPeer.getRemoteAddressString.

  was:
Found the following exception thrown in a HBase RegionServer log (Hadoop 3.0.0 
+ HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this cluster, 
but I am not sure if that's related at all):
{noformat}
2018-07-11 11:10:44,462 WARN org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: 
Stream moved/closed or prefetch 
cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5,
 offset=216505135,
end=2309991906
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99)
at 
org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992)
at 

[jira] [Created] (HDFS-13730) BlockReaderRemote.sendReadResult throws NPE

2018-07-11 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-13730:
--

 Summary: BlockReaderRemote.sendReadResult throws NPE
 Key: HDFS-13730
 URL: https://issues.apache.org/jira/browse/HDFS-13730
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
 Environment: Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403.
Reporter: Wei-Chiu Chuang


Found the following exception thrown in a HBase RegionServer log (Hadoop 3.0.0 
+ HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this cluster, 
but I am not sure if that's related at all):
{noformat}
2018-07-11 11:10:44,462 WARN org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: 
Stream moved/closed or prefetch 
cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5,
 offset=216505135,
end=2309991906
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99)
at 
org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1348)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1312)
at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:331)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1565)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){noformat}
The relevant Hadoop code:
{code:java|title=BlockReaderRemote#sendReadResult}
void sendReadResult(Status statusCode) {
  assert !sentStatusCode : "already sent status code to " + peer;
  try {
writeReadResult(peer.getOutputStream(), statusCode);
sentStatusCode = true;
  } catch (IOException e) {
// It's ok not to be able to send this. But something is probably wrong.
LOG.info("Could not send read status (" + statusCode + ") to datanode " +
peer.getRemoteAddressString() + ": " + e.getMessage());
  }
}
{code}
So the NPE was thrown within a exception handler. A possible explanation could 
be that the socket was closed so client couldn't write, and 
Socket#getRemoteSocketAddress() returns null when the socket is closed.

Suggest check for nullity and return an empty string in
{noformat}
NioInetPeer.getRemoteAddressString{noformat}
.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13730) BlockReaderRemote.sendReadResult throws NPE

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13730:
---
Affects Version/s: 3.0.0

> BlockReaderRemote.sendReadResult throws NPE
> ---
>
> Key: HDFS-13730
> URL: https://issues.apache.org/jira/browse/HDFS-13730
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
> Environment: Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403.
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> Found the following exception thrown in a HBase RegionServer log (Hadoop 
> 3.0.0 + HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this 
> cluster, but I am not sure if that's related at all):
> {noformat}
> 2018-07-11 11:10:44,462 WARN 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: Stream moved/closed or 
> prefetch 
> cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5,
>  offset=216505135,
> end=2309991906
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99)
> at 
> org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992)
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1348)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1312)
> at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:331)
> at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1565)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){noformat}
> The relevant Hadoop code:
> {code:java|title=BlockReaderRemote#sendReadResult}
> void sendReadResult(Status statusCode) {
>   assert !sentStatusCode : "already sent status code to " + peer;
>   try {
> writeReadResult(peer.getOutputStream(), statusCode);
> sentStatusCode = true;
>   } catch (IOException e) {
> // It's ok not to be able to send this. But something is probably wrong.
> LOG.info("Could not send read status (" + statusCode + ") to datanode " +
> peer.getRemoteAddressString() + ": " + e.getMessage());
>   }
> }
> {code}
> So the NPE was thrown within a exception handler. A possible explanation 
> could be that the socket was closed so client couldn't write, and 
> Socket#getRemoteSocketAddress() returns null when the socket is closed.
> Suggest check for nullity and return an empty string in
> {noformat}
> NioInetPeer.getRemoteAddressString{noformat}
> .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-222) Remove hdfs command line from ozone distrubution.

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540524#comment-16540524
 ] 

Anu Engineer commented on HDDS-222:
---

Hold on, I am reverting HDFS-242. [~nandakumar131] FYI and I will recommit 
after I fix the issues.

 

> Remove hdfs command line from ozone distrubution.
> -
>
> Key: HDDS-222
> URL: https://issues.apache.org/jira/browse/HDDS-222
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-222.001.patch
>
>
> Az the ozone release artifact doesn't contain a stable namenode/datanode code 
> the hdfs command should be removed from the ozone artifact.
> ozone-dist-layout-stitching also could be simplified to copy only the 
> required jar files (we don't need to copy the namenode/datanode server side 
> jars, just the common artifacts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13729) Fix broken links to RBF documentation

2018-07-11 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-13729:
-
Fix Version/s: 3.0.4
   3.1.1
   3.2.0

Thank you for your contribution, [~gabor.bota]! Committed this to trunk, 
branch-3.1, and branch-3.0. Would you rebase the patch for branch-2 and 
branch-2.9?

> Fix broken links to RBF documentation
> -
>
> Key: HDFS-13729
> URL: https://issues.apache.org/jira/browse/HDFS-13729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: jwhitter
>Assignee: Gabor Bota
>Priority: Minor
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15589.001.patch, hadoop_broken_link.png
>
>
> A broken link on the page [http://hadoop.apache.org/docs/current/]
>  * HDFS
>  ** HDFS Router based federation. See the [user 
> documentation|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  for more details.
> The link for user documentation 
> [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  is not found.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13729) Fix broken links to RBF documentation

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540515#comment-16540515
 ] 

genericqa commented on HDFS-13729:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-13729 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13729 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931184/HADOOP-15589.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24585/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fix broken links to RBF documentation
> -
>
> Key: HDFS-13729
> URL: https://issues.apache.org/jira/browse/HDFS-13729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: jwhitter
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15589.001.patch, hadoop_broken_link.png
>
>
> A broken link on the page [http://hadoop.apache.org/docs/current/]
>  * HDFS
>  ** HDFS Router based federation. See the [user 
> documentation|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  for more details.
> The link for user documentation 
> [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  is not found.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13729) Fix broken links to RBF documentation

2018-07-11 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-13729:
-
Summary: Fix broken links to RBF documentation  (was: Broken link within 
documentation - http://hadoop.apache.org/docs/current/)

> Fix broken links to RBF documentation
> -
>
> Key: HDFS-13729
> URL: https://issues.apache.org/jira/browse/HDFS-13729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: jwhitter
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15589.001.patch, hadoop_broken_link.png
>
>
> A broken link on the page [http://hadoop.apache.org/docs/current/]
>  * HDFS
>  ** HDFS Router based federation. See the [user 
> documentation|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  for more details.
> The link for user documentation 
> [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  is not found.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-13729) Broken link within documentation - http://hadoop.apache.org/docs/current/

2018-07-11 Thread Akira Ajisaka (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka moved HADOOP-15589 to HDFS-13729:
---

Component/s: (was: documentation)
 documentation
Key: HDFS-13729  (was: HADOOP-15589)
Project: Hadoop HDFS  (was: Hadoop Common)

> Broken link within documentation - http://hadoop.apache.org/docs/current/
> -
>
> Key: HDFS-13729
> URL: https://issues.apache.org/jira/browse/HDFS-13729
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: jwhitter
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15589.001.patch, hadoop_broken_link.png
>
>
> A broken link on the page [http://hadoop.apache.org/docs/current/]
>  * HDFS
>  ** HDFS Router based federation. See the [user 
> documentation|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  for more details.
> The link for user documentation 
> [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSRouterFederation.html]
>  is not found.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-246) Datanode should throw BlockNotCommittedException for uncommitted blocks to Ozone Client

2018-07-11 Thread Shashikant Banerjee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540500#comment-16540500
 ] 

Shashikant Banerjee commented on HDDS-246:
--

patch v0 is dependent on HDDS-181 and HDDS-203.

> Datanode should throw BlockNotCommittedException for uncommitted blocks to 
> Ozone Client
> ---
>
> Key: HDDS-246
> URL: https://issues.apache.org/jira/browse/HDDS-246
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-246.00.patch
>
>
> As a part of closing the container on a datanode, all the open keys(blocks) 
> will be committed.In between if the client calls getCommittedBlockLength for 
> uncommitted block on the container, the leader will throw BlockNotCommitted 
> exception to the Client Back. The client should retry to fetch the committed 
> block length and update the OzoneManager with the length.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-246) Datanode should throw BlockNotCommittedException for uncommitted blocks to Ozone Client

2018-07-11 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-246:
-
Attachment: HDDS-246.00.patch

> Datanode should throw BlockNotCommittedException for uncommitted blocks to 
> Ozone Client
> ---
>
> Key: HDDS-246
> URL: https://issues.apache.org/jira/browse/HDDS-246
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-246.00.patch
>
>
> As a part of closing the container on a datanode, all the open keys(blocks) 
> will be committed.In between if the client calls getCommittedBlockLength for 
> uncommitted block on the container, the leader will throw BlockNotCommitted 
> exception to the Client Back. The client should retry to fetch the committed 
> block length and update the OzoneManager with the length.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-222) Remove hdfs command line from ozone distrubution.

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540484#comment-16540484
 ] 

Anu Engineer commented on HDDS-222:
---

I think I found the issue. while I committed this patch
{noformat}

commit a47ec5dac4a1cdfec788ce3296b4f610411911ea
Author: Anu Engineer 
Date:   Tue Jul 10 15:58:47 2018 -0700

    HDDS-242. Introduce NEW_NODE, STALE_NODE and DEAD_NODE event
    and corresponding event handlers in SCM.
    Contributed by Nanda Kumar.{noformat}
I accidentally committed {{hadoop-ozone/common/src/main/bin/ozone-config.sh}}. 
I was trying to amortize the cost of acceptance tests by batching patches.

 

Then it is did not work and I reverted the tree, somehow this file was not 
reset. So [~elek] can you please rebase this patch – probably just remove the 
ozone-config.sh from the patch and see if that is working ? I don't want to 
revert since it might be easier to fix this mistake in this patch. Thanks for 
the help and sorry about the goof up.

 

> Remove hdfs command line from ozone distrubution.
> -
>
> Key: HDDS-222
> URL: https://issues.apache.org/jira/browse/HDDS-222
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-222.001.patch
>
>
> Az the ozone release artifact doesn't contain a stable namenode/datanode code 
> the hdfs command should be removed from the ozone artifact.
> ozone-dist-layout-stitching also could be simplified to copy only the 
> required jar files (we don't need to copy the namenode/datanode server side 
> jars, just the common artifacts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-07-11 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540462#comment-16540462
 ] 

Daniel Templeton commented on HDFS-13448:
-

There was something squirrelly with that pre-commit run.  Retriggered; fingers 
crossed.

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.14.patch, 
> HDFS-13448.6.patch, HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-222) Remove hdfs command line from ozone distrubution.

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540457#comment-16540457
 ] 

Anu Engineer commented on HDDS-222:
---

+1, thanks it worked for me too. I did a clean of the repo and did a rebuild 
with -pdist. Thanks for the patch. I will commit it shortly.

 

> Remove hdfs command line from ozone distrubution.
> -
>
> Key: HDDS-222
> URL: https://issues.apache.org/jira/browse/HDDS-222
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
>  Labels: newbie
> Fix For: 0.2.1
>
> Attachments: HDDS-222.001.patch
>
>
> Az the ozone release artifact doesn't contain a stable namenode/datanode code 
> the hdfs command should be removed from the ozone artifact.
> ozone-dist-layout-stitching also could be simplified to copy only the 
> required jar files (we don't need to copy the namenode/datanode server side 
> jars, just the common artifacts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540456#comment-16540456
 ] 

genericqa commented on HDFS-13448:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 27m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
4s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
43s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m  2s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}243m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.client.impl.TestBlockReaderLocal |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13448 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931168/HDFS-13448.14.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux e29c7b96e11a 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-07-11 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540452#comment-16540452
 ] 

Daniel Templeton commented on HDFS-13448:
-

Close enough.  I think you could use a ';' at the end of that error string, but 
I can add that when I commit.  +1 from me.  [~daryn], wanna take another look 
before it goes in?  If I don't hear anything by Monday, I plan to commit it.

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.14.patch, 
> HDFS-13448.6.patch, HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-251) Integrate BlockDeletingService in KeyValueHandler

2018-07-11 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-251:
-
Attachment: HDDS-251.001.patch

> Integrate BlockDeletingService in KeyValueHandler
> -
>
> Key: HDDS-251
> URL: https://issues.apache.org/jira/browse/HDDS-251
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-251.001.patch
>
>
> This Jira aims to integrate BlockDeletingService in KeyValueHandler. It also 
> fixes the unit tests related to delete blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-251) Integrate BlockDeletingService in KeyValueHandler

2018-07-11 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-251:
-
Status: Patch Available  (was: Open)

> Integrate BlockDeletingService in KeyValueHandler
> -
>
> Key: HDDS-251
> URL: https://issues.apache.org/jira/browse/HDDS-251
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-251.001.patch
>
>
> This Jira aims to integrate BlockDeletingService in KeyValueHandler. It also 
> fixes the unit tests related to delete blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-249) Fail if multiple SCM IDs on the DataNode and add SCM ID check after version request

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540404#comment-16540404
 ] 

genericqa commented on HDDS-249:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
33s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
35s{color} | {color:red} hadoop-hdds/server-scm in trunk has 3 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
51s{color} | {color:red} hadoop-hdds/container-service generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
13s{color} | {color:green} container-service in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
18s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdds/container-service |
|  |  Possible null pointer dereference in 
org.apache.hadoop.ozone.container.common.utils.HddsVolumeUtil.checkVolume(HddsVolume,
 String, String, Logger) due to return value of called method  Dereferenced at 
HddsVolumeUtil.java:org.apache.hadoop.ozone.container.common.utils.HddsVolumeUtil.checkVolume(HddsVolume,
 String, String, Logger) due to return value of called method  Dereferenced at 
HddsVolumeUtil.java:[line 189] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDDS-249 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931189/HDDS-249.01.patch |
| Optional Tests |  asflicense  compile  javac  

[jira] [Created] (HDDS-251) Integrate BlockDeletingService in KeyValueHandler

2018-07-11 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-251:


 Summary: Integrate BlockDeletingService in KeyValueHandler
 Key: HDDS-251
 URL: https://issues.apache.org/jira/browse/HDDS-251
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain
Assignee: Lokesh Jain
 Fix For: 0.2.1


This Jira aims to integrate BlockDeletingService in KeyValueHandler. It also 
fixes the unit tests related to delete blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540397#comment-16540397
 ] 

Anu Engineer commented on HDFS-13728:
-

Let us do this change, however there are two major things.
 * This precondition is correct. If this is being violated, we have a latent 
bug in code.
 * I am fine with disabling the precondition, but there are 2 different ways in 
mind to do it.
 ** Convert Precondition to a LOG.warn or LOG.error
 ** Allow this operation to proceed on this Data node, if and only if *-force* 
flag is specified. This means that one of us is willing to hunt down this bug 
in the long run. For short term, that is not on our radar.

[~sodonnell] Thank you for root causing these issues and posting the 
suggestions, Since you have really done all the hard work, would you do the 
honors of posting a patch too ? – that is just add a LOG.error message to the 
suggested change.

> Disk Balaner should not fail if volume usage is greater than capacity
> -
>
> Key: HDFS-13728
> URL: https://issues.apache.org/jira/browse/HDFS-13728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Priority: Minor
>
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>public void setUsed(long dfsUsedSpace) {
> -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -this.used = dfsUsedSpace;
> +if (dfsUsedSpace > this.getCapacity()) {
> +  this.used = this.getCapacity();
> +} else {
> +  this.used = dfsUsedSpace;
> +}
>}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13723) Occasional "Should be different group" error in TestRefreshUserMappings#testGroupMappingRefresh

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540386#comment-16540386
 ] 

Wei-Chiu Chuang commented on HDFS-13723:


and branch-3.1, branch-3.0 as well.

> Occasional "Should be different group" error in 
> TestRefreshUserMappings#testGroupMappingRefresh
> ---
>
> Key: HDFS-13723
> URL: https://issues.apache.org/jira/browse/HDFS-13723
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, test
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13723.001.patch, HDFS-13723.002.patch, 
> HDFS-13723.003.patch
>
>
> In some occasions, the user-group mapping refresh timeout test assertion 
> would fail due to the mapping didn't refresh in time, reporting "Should be 
> different group".
>  
> Trace:
> {code:java}
> java.lang.AssertionError: Should be different group 
> at 
> org.apache.hadoop.security.TestRefreshUserMappings.testGroupMappingRefresh(TestRefreshUserMappings.java:153)
> :
> :
> 2018-07-04 19:35:21,073 [BP-1412052829-172.26.17.254-1530758120647 
> heartbeating to localhost/127.0.0.1:39524] INFO datanode.DataNode 
> (BPOfferService.java:processCommandFromActive(759)) - Got finalize command 
> for block pool BP-1412052829-172.26.17.254-1530758120647
> Getting groups in MockUnixGroupsMapping
> 2018-07-04 19:35:21,090 [IPC Server handler 6 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 cmd=datanodeReport
> src=nulldst=nullperm=null   proto=rpc
> 2018-07-04 19:35:21,092 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2629)) - Cluster is active
> 2018-07-04 19:35:21,095 [IPC Server handler 7 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 cmd=datanodeReport
> src=nulldst=nullperm=null   proto=rpc
> 2018-07-04 19:35:21,096 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2629)) - Cluster is active
> first attempt:
> [jenkins11, jenkins12]
> second attempt, should be same:
> [jenkins11, jenkins12]
> 2018-07-04 19:35:21,101 [IPC Server handler 5 on 39524] INFO 
> namenode.NameNode (NameNodeRpcServer.java:refreshUserToGroupsMappings(1648)) 
> - Refreshing all user-to-groups mappings. Requested by user: jenkins
> 2018-07-04 19:35:21,101 [IPC Server handler 5 on 39524] INFO security.Groups 
> (Groups.java:refresh(401)) - clearing userToGroupsMap cache
> Refreshing groups in MockUnixGroupsMapping
> 2018-07-04 19:35:21,102 [IPC Server handler 5 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 
> cmd=refreshUserToGroupsMappings   src=nulldst=nullperm=null   
> proto=rpc
> Refresh user to groups mapping successful
> third attempt(after refresh command), should be different:
> Getting groups in MockUnixGroupsMapping
> [jenkins21, jenkins22]
> fourth attempt(after timeout), should be different:
> [jenkins21, jenkins22]
> Getting groups in MockUnixGroupsMapping
> 2018-07-04 19:35:22,204 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1965)) - Shutting down the Mini HDFS Cluster
> {code}
>  
> Solution:
> Increase the timeout slightly, and place debugging message in load() and 
> reload() methods in class GroupCacheLoader.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13723) Occasional "Should be different group" error in TestRefreshUserMappings#testGroupMappingRefresh

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13723:
---
Fix Version/s: 3.0.4
   3.1.1

> Occasional "Should be different group" error in 
> TestRefreshUserMappings#testGroupMappingRefresh
> ---
>
> Key: HDFS-13723
> URL: https://issues.apache.org/jira/browse/HDFS-13723
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, test
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13723.001.patch, HDFS-13723.002.patch, 
> HDFS-13723.003.patch
>
>
> In some occasions, the user-group mapping refresh timeout test assertion 
> would fail due to the mapping didn't refresh in time, reporting "Should be 
> different group".
>  
> Trace:
> {code:java}
> java.lang.AssertionError: Should be different group 
> at 
> org.apache.hadoop.security.TestRefreshUserMappings.testGroupMappingRefresh(TestRefreshUserMappings.java:153)
> :
> :
> 2018-07-04 19:35:21,073 [BP-1412052829-172.26.17.254-1530758120647 
> heartbeating to localhost/127.0.0.1:39524] INFO datanode.DataNode 
> (BPOfferService.java:processCommandFromActive(759)) - Got finalize command 
> for block pool BP-1412052829-172.26.17.254-1530758120647
> Getting groups in MockUnixGroupsMapping
> 2018-07-04 19:35:21,090 [IPC Server handler 6 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 cmd=datanodeReport
> src=nulldst=nullperm=null   proto=rpc
> 2018-07-04 19:35:21,092 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2629)) - Cluster is active
> 2018-07-04 19:35:21,095 [IPC Server handler 7 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 cmd=datanodeReport
> src=nulldst=nullperm=null   proto=rpc
> 2018-07-04 19:35:21,096 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2629)) - Cluster is active
> first attempt:
> [jenkins11, jenkins12]
> second attempt, should be same:
> [jenkins11, jenkins12]
> 2018-07-04 19:35:21,101 [IPC Server handler 5 on 39524] INFO 
> namenode.NameNode (NameNodeRpcServer.java:refreshUserToGroupsMappings(1648)) 
> - Refreshing all user-to-groups mappings. Requested by user: jenkins
> 2018-07-04 19:35:21,101 [IPC Server handler 5 on 39524] INFO security.Groups 
> (Groups.java:refresh(401)) - clearing userToGroupsMap cache
> Refreshing groups in MockUnixGroupsMapping
> 2018-07-04 19:35:21,102 [IPC Server handler 5 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 
> cmd=refreshUserToGroupsMappings   src=nulldst=nullperm=null   
> proto=rpc
> Refresh user to groups mapping successful
> third attempt(after refresh command), should be different:
> Getting groups in MockUnixGroupsMapping
> [jenkins21, jenkins22]
> fourth attempt(after timeout), should be different:
> [jenkins21, jenkins22]
> Getting groups in MockUnixGroupsMapping
> 2018-07-04 19:35:22,204 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1965)) - Shutting down the Mini HDFS Cluster
> {code}
>  
> Solution:
> Increase the timeout slightly, and place debugging message in load() and 
> reload() methods in class GroupCacheLoader.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13723) Occasional "Should be different group" error in TestRefreshUserMappings#testGroupMappingRefresh

2018-07-11 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13723:
---
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Pushed the 003 patch to trunk (3.2.0 release). Thanks [~smeng] for reporting 
the issue and the fix!

> Occasional "Should be different group" error in 
> TestRefreshUserMappings#testGroupMappingRefresh
> ---
>
> Key: HDFS-13723
> URL: https://issues.apache.org/jira/browse/HDFS-13723
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, test
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13723.001.patch, HDFS-13723.002.patch, 
> HDFS-13723.003.patch
>
>
> In some occasions, the user-group mapping refresh timeout test assertion 
> would fail due to the mapping didn't refresh in time, reporting "Should be 
> different group".
>  
> Trace:
> {code:java}
> java.lang.AssertionError: Should be different group 
> at 
> org.apache.hadoop.security.TestRefreshUserMappings.testGroupMappingRefresh(TestRefreshUserMappings.java:153)
> :
> :
> 2018-07-04 19:35:21,073 [BP-1412052829-172.26.17.254-1530758120647 
> heartbeating to localhost/127.0.0.1:39524] INFO datanode.DataNode 
> (BPOfferService.java:processCommandFromActive(759)) - Got finalize command 
> for block pool BP-1412052829-172.26.17.254-1530758120647
> Getting groups in MockUnixGroupsMapping
> 2018-07-04 19:35:21,090 [IPC Server handler 6 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 cmd=datanodeReport
> src=nulldst=nullperm=null   proto=rpc
> 2018-07-04 19:35:21,092 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2629)) - Cluster is active
> 2018-07-04 19:35:21,095 [IPC Server handler 7 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 cmd=datanodeReport
> src=nulldst=nullperm=null   proto=rpc
> 2018-07-04 19:35:21,096 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:waitActive(2629)) - Cluster is active
> first attempt:
> [jenkins11, jenkins12]
> second attempt, should be same:
> [jenkins11, jenkins12]
> 2018-07-04 19:35:21,101 [IPC Server handler 5 on 39524] INFO 
> namenode.NameNode (NameNodeRpcServer.java:refreshUserToGroupsMappings(1648)) 
> - Refreshing all user-to-groups mappings. Requested by user: jenkins
> 2018-07-04 19:35:21,101 [IPC Server handler 5 on 39524] INFO security.Groups 
> (Groups.java:refresh(401)) - clearing userToGroupsMap cache
> Refreshing groups in MockUnixGroupsMapping
> 2018-07-04 19:35:21,102 [IPC Server handler 5 on 39524] INFO 
> FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7805)) - allowed=true   
>ugi=jenkins (auth:SIMPLE)   ip=/127.0.0.1 
> cmd=refreshUserToGroupsMappings   src=nulldst=nullperm=null   
> proto=rpc
> Refresh user to groups mapping successful
> third attempt(after refresh command), should be different:
> Getting groups in MockUnixGroupsMapping
> [jenkins21, jenkins22]
> fourth attempt(after timeout), should be different:
> [jenkins21, jenkins22]
> Getting groups in MockUnixGroupsMapping
> 2018-07-04 19:35:22,204 [main] INFO hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1965)) - Shutting down the Mini HDFS Cluster
> {code}
>  
> Solution:
> Increase the timeout slightly, and place debugging message in load() and 
> reload() methods in class GroupCacheLoader.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540384#comment-16540384
 ] 

Arpit Agarwal commented on HDFS-13728:
--

+1 for this defensive change.

[~sodonnell], do you want to contribute your change as a patch?

> Disk Balaner should not fail if volume usage is greater than capacity
> -
>
> Key: HDFS-13728
> URL: https://issues.apache.org/jira/browse/HDFS-13728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.3
>Reporter: Stephen O'Donnell
>Priority: Minor
>
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
> "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
> dfsUsedSpace, getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>public void setUsed(long dfsUsedSpace) {
> -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -this.used = dfsUsedSpace;
> +if (dfsUsedSpace > this.getCapacity()) {
> +  this.used = this.getCapacity();
> +} else {
> +  this.used = dfsUsedSpace;
> +}
>}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-238) Add Node2Pipeline Map in SCM to track ratis/standalone pipelines.

2018-07-11 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540345#comment-16540345
 ] 

Xiaoyu Yao edited comment on HDDS-238 at 7/11/18 4:35 PM:
--

Thanks [~msingh] for working on this. The patch v2 looks good to me. I just 
have a few comments below:

 

ContainerStateMap.java

Line 147: can we wrap the logic around containerState->PiplelineState in a 
helper function for better reuse and future changes?

 

PipelineManager.java

Line 139: should we update the {color:#1948a6}node2PipelineMap in 
{color}{color:#00}removePipeline() as well?{color}

 

Node2PipelineMap.java

Line 109: this can be simplified with Java8 computeIfAbsent like below without 
helper function isKnownDatanode()

 
{code:java}
dn2PipelineMap.computeIfAbsent(dnId,k->Collections.synchronizedSet(new 
{color}HashSet<>())).add(pipeline);

{code}
 

Line 87: NIT: "pipeline name to open container mappings"

Line 122: should we return an immutable collection for getDn2PipelineMap?


was (Author: xyao):
Thanks [~msingh] for working on this. The patch v2 looks good to me. I just 
have a few comments below:

 

ContainerStateMap.java

Line 147: can we wrap the logic around containerState->PiplelineState in a 
helper function for better reuse and future changes?

 

PipelineManager.java

Line 139: should we update the {color:#1948a6}node2PipelineMap in 
{color}{color:#00}removePipeline() as well?{color}

 

Node2PipelineMap.java

Line 109: this can be simplified with Java8 computeIfAbsent like below without 
helper function isKnownDatanode()

 

{code}

{color:#1948a6}dn2PipelineMap{color}{color:#00}.computeIfAbsent(dnId,k->Collections.{color}{color:#00}synchronizedSet{color}{color:#00}({color}{color:#80}new
 {color}HashSet<>())).add(pipeline);

{code}

 

Line 87: NIT: "pipeline name to open container mappings"

Line 122: should we return an immutable collection for getDn2PipelineMap?

> Add Node2Pipeline Map in SCM to track ratis/standalone pipelines.
> -
>
> Key: HDDS-238
> URL: https://issues.apache.org/jira/browse/HDDS-238
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-238.001.patch, HDDS-238.002.patch
>
>
> This jira proposes to add a Node2Pipeline map which can be used to during 
> pipeline failure to identify a pipeline for a corresponding failed datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-238) Add Node2Pipeline Map in SCM to track ratis/standalone pipelines.

2018-07-11 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540345#comment-16540345
 ] 

Xiaoyu Yao commented on HDDS-238:
-

Thanks [~msingh] for working on this. The patch v2 looks good to me. I just 
have a few comments below:

 

ContainerStateMap.java

Line 147: can we wrap the logic around containerState->PiplelineState in a 
helper function for better reuse and future changes?

 

PipelineManager.java

Line 139: should we update the {color:#1948a6}node2PipelineMap in 
{color}{color:#00}removePipeline() as well?{color}

 

Node2PipelineMap.java

Line 109: this can be simplified with Java8 computeIfAbsent like below without 
helper function isKnownDatanode()

 

{code}

{color:#1948a6}dn2PipelineMap{color}{color:#00}.computeIfAbsent(dnId,k->Collections.{color}{color:#00}synchronizedSet{color}{color:#00}({color}{color:#80}new
 {color}HashSet<>())).add(pipeline);

{code}

 

Line 87: NIT: "pipeline name to open container mappings"

Line 122: should we return an immutable collection for getDn2PipelineMap?

> Add Node2Pipeline Map in SCM to track ratis/standalone pipelines.
> -
>
> Key: HDDS-238
> URL: https://issues.apache.org/jira/browse/HDDS-238
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-238.001.patch, HDDS-238.002.patch
>
>
> This jira proposes to add a Node2Pipeline map which can be used to during 
> pipeline failure to identify a pipeline for a corresponding failed datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Stephen O'Donnell (JIRA)
Stephen O'Donnell created HDFS-13728:


 Summary: Disk Balaner should not fail if volume usage is greater 
than capacity
 Key: HDFS-13728
 URL: https://issues.apache.org/jira/browse/HDFS-13728
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: diskbalancer
Affects Versions: 3.0.3
Reporter: Stephen O'Donnell


We have seen a couple of scenarios where the disk balancer fails because a 
datanode reports more spaced used on a disk than its capacity, which should not 
be possible.

This is due to the check below in DiskBalancerVolume.java:

{code}
  public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
"DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
dfsUsedSpace, getCapacity());
this.used = dfsUsedSpace;
  }
{code}

While I agree that it should not be possible for a DN to report more usage on a 
volume than its capacity, there seems to be some issue that causes this to 
occur sometimes.

In general, this full disk is what causes someone to want to run the Disk 
Balancer, only to find it fails with the error.

There appears to be nothing you can do to force the Disk Balancer to run at 
this point, but in the scenarios I saw, some data was removed from the disk and 
usage dropped below the capacity resolving the issue.

Can we considered relaxing the above check, and if the usage is greater than 
the capacity, just set the usage to the capacity so the calculations all work 
ok?

Eg something like this:

{code}
   public void setUsed(long dfsUsedSpace) {
-Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
-this.used = dfsUsedSpace;
+if (dfsUsedSpace > this.getCapacity()) {
+  this.used = this.getCapacity();
+} else {
+  this.used = dfsUsedSpace;
+}
   }
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-226) Client should update block length in OM while committing the key

2018-07-11 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540340#comment-16540340
 ] 

genericqa commented on HDDS-226:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 29m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
27s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
37s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} ozone-manager in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 27s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}164m 46s{color} | 
{color:black} 

[jira] [Created] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandle exceptiopn

2018-07-11 Thread Stephen O'Donnell (JIRA)
Stephen O'Donnell created HDFS-13727:


 Summary: Log full stack trace if DiskBalancer exits with an 
unhandle exceptiopn
 Key: HDFS-13727
 URL: https://issues.apache.org/jira/browse/HDFS-13727
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: diskbalancer
Affects Versions: 3.0.3
Reporter: Stephen O'Donnell


In HDFS-13175 it was discovered that when a DN reports the usage on a volume to 
be greater than the volume capacity, the disk balancer will fail with an 
unhelpful error:

{code}
$ hdfs diskbalancer -report -top 5

18/06/11 10:19:43 INFO command.Command: Processing report command
18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: 
java.lang.IllegalArgumentException
{code}

In HDFS-13175, a change was made to include more details in the exception name, 
 so after the change the code is:

{code}
  public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
"DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
dfsUsedSpace, getCapacity());
this.used = dfsUsedSpace;
  }
{code}

There may however be other scenarios that cause the balancer to exit with an 
unhandled exception, and it would be helpful if the tool logged out the full 
stack trace on error rather than just the exception name.

In DiskBalancerCLI.java, the relevant code is:

{code}
  public static void main(String[] argv) throws Exception {
DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
int res = 0;
try {
  res = ToolRunner.run(shell, argv);
} catch (Exception ex) {
  LOG.error(ex.toString());
  res = 1;
}
System.exit(res);
  }
{code}

We should change the error logged in the exception block to log out the full 
stack to give more information on all unhandled errors, eg:

{code}
LOG.error(ex.toString(), ex);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-249) Fail if multiple SCM IDs on the DataNode and add SCM ID check after version request

2018-07-11 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540304#comment-16540304
 ] 

Bharat Viswanadham commented on HDDS-249:
-

Fixed find bug issues and javadoc issues in patch v01.

> Fail if multiple SCM IDs on the DataNode and add SCM ID check after version 
> request
> ---
>
> Key: HDDS-249
> URL: https://issues.apache.org/jira/browse/HDDS-249
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-249.00.patch, HDDS-249.01.patch
>
>
> This Jira take care of following conditions:
>  # If multiple Scm directories exist on datanode, it fails that volume.
>  # validate SCMID response from SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-249) Fail if multiple SCM IDs on the DataNode and add SCM ID check after version request

2018-07-11 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HDDS-249:

Attachment: HDDS-249.01.patch

> Fail if multiple SCM IDs on the DataNode and add SCM ID check after version 
> request
> ---
>
> Key: HDDS-249
> URL: https://issues.apache.org/jira/browse/HDDS-249
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-249.00.patch, HDDS-249.01.patch
>
>
> This Jira take care of following conditions:
>  # If multiple Scm directories exist on datanode, it fails that volume.
>  # validate SCMID response from SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13726) RBF: Fix RBF configuration links

2018-07-11 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540203#comment-16540203
 ] 

Takanobu Asanuma commented on HDFS-13726:
-

Thanks for committing it, [~linyiqun]!

> RBF: Fix RBF configuration links
> 
>
> Key: HDFS-13726
> URL: https://issues.apache.org/jira/browse/HDFS-13726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13726.1.patch, HDFS-13726.2.patch
>
>
> That moved from hdfs-default.xml to hdfs-rbf-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13726) RBF: Fix RBF configuration links

2018-07-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540176#comment-16540176
 ] 

Hudson commented on HDFS-13726:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14557 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14557/])
HDFS-13726. RBF: Fix RBF configuration links. Contributed by Takanobu (yqlin: 
rev 2ae13d41dcd4f49e6b4ebc099e5f8bb8280b9872)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md


> RBF: Fix RBF configuration links
> 
>
> Key: HDFS-13726
> URL: https://issues.apache.org/jira/browse/HDFS-13726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13726.1.patch, HDFS-13726.2.patch
>
>
> That moved from hdfs-default.xml to hdfs-rbf-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13726) RBF: Fix RBF configuration links

2018-07-11 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13726:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> RBF: Fix RBF configuration links
> 
>
> Key: HDFS-13726
> URL: https://issues.apache.org/jira/browse/HDFS-13726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13726.1.patch, HDFS-13726.2.patch
>
>
> That moved from hdfs-default.xml to hdfs-rbf-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13726) RBF: Fix RBF configuration links

2018-07-11 Thread Yiqun Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-13726:
-
Affects Version/s: 3.1.0
   3.0.3
 Priority: Minor  (was: Major)
 Hadoop Flags: Reviewed
 Target Version/s: 3.1.1, 3.0.4
Fix Version/s: 3.0.4
   3.1.1
   3.2.0
   2.10.0

Committed this to trunk, branch-3.1, branch-3.0 and branch-2.

Thanks [~tasanuma0829] for the contribution!

> RBF: Fix RBF configuration links
> 
>
> Key: HDFS-13726
> URL: https://issues.apache.org/jira/browse/HDFS-13726
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13726.1.patch, HDFS-13726.2.patch
>
>
> That moved from hdfs-default.xml to hdfs-rbf-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >