date:20190910

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310317
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 11/Sep/19 05:46
Start Date: 11/Sep/19 05:46
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1344: HDDS-1982 Extend 
SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#issuecomment-530229574
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 92 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 15 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 24 | Maven dependency ordering for branch |
   | +1 | mvninstall | 649 | trunk passed |
   | +1 | compile | 405 | trunk passed |
   | +1 | checkstyle | 76 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 932 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 182 | trunk passed |
   | 0 | spotbugs | 499 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 750 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 32 | Maven dependency ordering for patch |
   | -1 | mvninstall | 299 | hadoop-ozone in the patch failed. |
   | -1 | compile | 247 | hadoop-ozone in the patch failed. |
   | -1 | cc | 247 | hadoop-ozone in the patch failed. |
   | -1 | javac | 247 | hadoop-ozone in the patch failed. |
   | -0 | checkstyle | 43 | hadoop-ozone: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 739 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 75 | hadoop-hdds generated 20 new + 16 unchanged - 0 fixed 
= 36 total (was 16) |
   | -1 | findbugs | 411 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | +1 | unit | 339 | hadoop-hdds in the patch passed. |
   | -1 | unit | 466 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 40 | The patch does not generate ASF License warnings. |
   | | | 6574 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.2 Server=19.03.2 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1344 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit javadoc 
mvninstall shadedclient findbugs checkstyle |
   | uname | Linux 2edae08c6f80 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / dacc448 |
   | Default Java | 1.8.0_212 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | cc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | javac | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-compile-hadoop-ozone.txt
 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/diff-checkstyle-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/diff-javadoc-javadoc-hadoop-hdds.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-findbugs-hadoop-ozone.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/testReport/ |
   | Max. process+thread count | 1247 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common hadoop-hdds/server-scm hadoop-hdds/tools 
hadoop-ozone/integration-test U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/4/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message

[jira] [Commented] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager

2019-09-10 Thread Akira Ajisaka (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927264#comment-16927264
 ] 

Akira Ajisaka commented on HDFS-14840:
--

LGTM, +1

> Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
> -
>
> Key: HDFS-14840
> URL: https://issues.apache.org/jira/browse/HDFS-14840
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14840.1.patch
>
>
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40
> Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Takanobu Asanuma (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927248#comment-16927248
 ] 

Takanobu Asanuma edited comment on HDFS-14838 at 9/11/19 5:22 AM:
--

+1 for HDFS-14838.001.patch, pending Jenkins.


was (Author: tasanuma0829):
+1 for HDFS-14838-1.patch, pending Jenkins.

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838-1.patch, HDFS-14838.001.patch, 
> HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Xieming Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14838:
--
Attachment: HDFS-14838.001.patch

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838-1.patch, HDFS-14838.001.patch, 
> HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1868) Ozone pipelines should be marked as ready only after the leader election is complete

2019-09-10 Thread Siddharth Wagle (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927259#comment-16927259
 ] 

Siddharth Wagle commented on HDDS-1868:
---

Do you think it makes sense to add a callback like _notifyLeader(RoleInfoProto 
roleInfoProto)_ to the RaftServerRpc? However, at the moment we don't seem to 
be calling into the state machine in _changeToLeader()_ in Ratis. Although, 
even the follower should also send a report. 

> Ozone pipelines should be marked as ready only after the leader election is 
> complete
> 
>
> Key: HDDS-1868
> URL: https://issues.apache.org/jira/browse/HDDS-1868
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-1868.01.patch, HDDS-1868.02.patch
>
>
> Ozone pipeline on restart start in allocated state, they are moved into open 
> state after all the pipeline have reported to it. However this potentially 
> can lead into an issue where the pipeline is still not ready to accept any 
> incoming IO operations.
> The pipelines should be marked as ready only after the leader election is 
> complete and leader is ready to accept incoming IO.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-10 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927253#comment-16927253
 ] 

Surendra Singh Lilhore edited comment on HDFS-14768 at 9/11/19 4:56 AM:


Thanks [~gjhkael].
{quote}you need run the code below, and you need check the block index 6 that 
recontruct on local path like
{quote}
UT should check the block corruption, we should not check it manually. If you 
are facing difficulties in writing UT, I can help you.

Your fix idea LGTM, but you some fix is already taken care by HDFS-14699, so 
you need to wait for it and then need to rebase your patch.

 

 


was (Author: surendrasingh):
Thanks [~gjhkael].
{quote}you need run the code below, and you need check the block index 6 that 
recontruct on local path like
{quote}
UT should check the block corruption, we should not check it manually. If you 
are facing difficulties in writing UT, I can help you.

you fix idea LGTM, but you some fix is already taken care by HDFS-14699, so you 
need to wait for it and then need to rebase your patch.

 

 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs,

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-10 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927253#comment-16927253
 ] 

Surendra Singh Lilhore commented on HDFS-14768:
---

Thanks [~gjhkael].
{quote}you need run the code below, and you need check the block index 6 that 
recontruct on local path like
{quote}
UT should check the block corruption, we should not check it manually. If you 
are facing difficulties in writing UT, I can help you.

you fix idea LGTM, but you some fix is already taken care by HDFS-14699, so you 
need to wait for it and then need to rebase your patch.

 

 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs,

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-10 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927251#comment-16927251
 ] 

Surendra Singh Lilhore commented on HDFS-14699:
---

[~marvelrock], are you talking about HDFS-14768 or this jira ?

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Takanobu Asanuma (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927248#comment-16927248
 ] 

Takanobu Asanuma commented on HDFS-14838:
-

+1 for HDFS-14838-1.patch, pending Jenkins.

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838-1.patch, HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Xieming Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14838:
--
Attachment: HDFS-14838-1.patch

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838-1.patch, HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-10 Thread CR Hota (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927242#comment-16927242
 ] 

CR Hota commented on HDFS-14609:


[~zhangchen] Thanks for the ping and patch.

After cancellation of token, we should try to renew and get InvalidToken 
exception. How do we validate InvalidToken exception test?

Am i missing something?

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310275
 ]

ASF GitHub Bot logged work on HDDS-2107:


Author: ASF GitHub Bot
Created on: 11/Sep/19 03:54
Start Date: 11/Sep/19 03:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1424: HDDS-2107. 
Datanodes should retry forever to connect to SCM in an…
URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530208630
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 41 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 67 | Maven dependency ordering for branch |
   | +1 | mvninstall | 589 | trunk passed |
   | +1 | compile | 381 | trunk passed |
   | +1 | checkstyle | 83 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 868 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 178 | trunk passed |
   | 0 | spotbugs | 417 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 615 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 41 | Maven dependency ordering for patch |
   | +1 | mvninstall | 536 | the patch passed |
   | +1 | compile | 387 | the patch passed |
   | +1 | javac | 387 | the patch passed |
   | +1 | checkstyle | 90 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 678 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 175 | the patch passed |
   | +1 | findbugs | 631 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 280 | hadoop-hdds in the patch failed. |
   | -1 | unit | 2824 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 55 | The patch does not generate ASF License warnings. |
   | | | 8715 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware
 |
   |   | hadoop.ozone.container.TestContainerReplication |
   |   | hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.client.rpc.TestContainerStateMachineFailures |
   |   | hadoop.ozone.client.rpc.Test2WayCommitInRatis |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.scm.TestContainerSmallFile |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStream |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.om.TestOzoneManagerHA |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1424 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 2f98f8163e51 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / f8f8598 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/testReport/ |
   | Max. process+thread count | 5408 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service hadoop-ozone/ozone-manager U: . 
|
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1424/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was:

[jira] [Commented] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927225#comment-16927225
 ] 

Hadoop QA commented on HDFS-14843:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 87m 
29s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14843 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980009/HDFS-14843.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 94a7316ce59a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f8f8598 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27838/testReport/ |
| Max. process+thread count | 4121 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27838/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Double Synchronization in BlockReportLeaseManager
>

[jira] [Resolved] (HDDS-1571) Create an interface for pipeline placement policy to support network topologies

2019-09-10 Thread Li Cheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-1571.

Fix Version/s: 0.5.0
   Resolution: Fixed

> Create an interface for pipeline placement policy to support network 
> topologies
> ---
>
> Key: HDDS-1571
> URL: https://issues.apache.org/jira/browse/HDDS-1571
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Leverage the work done in HDDS-700 for pipeline creation for open containers.
> Create an interface that can provide different policy implementations for 
> pipeline creation. The default implementation should take into account no 
> topology information is configured.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable

2019-09-10 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14844:
---
Description: details for HDFS-14820

> Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream  
> configurable
> --
>
> Key: HDFS-14844
> URL: https://issues.apache.org/jira/browse/HDFS-14844
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
>
> details for HDFS-14820



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14844) Make buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream configurable

2019-09-10 Thread Lisheng Sun (Jira)

Lisheng Sun created HDFS-14844:
--

 Summary: Make buffer of 
BlockReaderRemote#newBlockReader#BufferedOutputStream  configurable
 Key: HDFS-14844
 URL: https://issues.apache.org/jira/browse/HDFS-14844
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lisheng Sun
Assignee: Lisheng Sun






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-10 Thread Lisheng Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927208#comment-16927208
 ] 

Lisheng Sun commented on HDFS-14795:


Thanks [~elgoiri] for your suggestion. 

I fixed the checkstyle and uploaded the v009 patch.

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, 
> HDFS-14795.009.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927207#comment-16927207
 ] 

Hadoop QA commented on HDFS-14838:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
34m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14838 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980011/HDFS-14838.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux 31fbe6852f96 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 524b553 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 306 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27839/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-10 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: HDFS-14795.009.patch

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, 
> HDFS-14795.009.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-10 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: (was: HDFS-14795.009.patch)

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14795) Add Throttler for writing block

2019-09-10 Thread Lisheng Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14795:
---
Attachment: HDFS-14795.009.patch

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, 
> HDFS-14795.009.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Xieming Li (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927195#comment-16927195
 ] 

Xieming Li commented on HDFS-14838:
---

[~tasanuma], Thank you for your review.

I will rework on this issue and submit another patch soon.

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14842) ByteArrayManager Reduce Synchronization

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927194#comment-16927194
 ] 

Hadoop QA commented on HDFS-14842:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14842 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12980008/HDFS-14842.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 775d3032b349 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f8f8598 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27837/testReport/ |
| Max. process+thread count | 415 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27837/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> ByteArrayManager Reduce Synchronization

[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Takanobu Asanuma (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927192#comment-16927192
 ] 

Takanobu Asanuma commented on HDFS-14838:
-

Thanks for working on this issue and submitting the patch, [~risyomei].

Since we want to keep the same interface between Router and NameNode, instead 
of using {{RouterId}}, it would be better to fix 
{{RBFMetrics#getHostAndPort()}}.

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: RBF, ui
> Attachments: HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-10 Thread HuangTao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927177#comment-16927177
 ] 

HuangTao edited comment on HDFS-14699 at 9/11/19 2:08 AM:
--

{quote}3. then shutdown one DN which did not have the same block id as 1 
decommission block, now we have 12 internal block(11 live block and 1 
decommission block)
4. after wait for about 600s (before the heart beat come) commission the 
decommissioned DN again, now we have 12 internal block(11 live block and 1 
duplicate block)
{quote}

{code:java}
// 
src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2314
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager#checkReplicaOnStorage
{code}
The above code has set the numReplicas.

{code:java}
  if (!bitSet.get(blockIndex)) {
bitSet.set(blockIndex);
  } else if (state == StoredReplicaState.LIVE) {
numReplicas.subtract(StoredReplicaState.LIVE, 1);
numReplicas.add(StoredReplicaState.REDUNDANT, 1);
  }
{code}
I think this block is used to recorrect the numReplicas when there are some 
Nodes being decommissioning, it has nothing with over-hardlimit.

I think we should reconstruct the "1 decommission block" without the 
over-hardlimit srcNode, so I keep a doubt about this fix.


was (Author: marvelrock):
{quote}3. then shutdown one DN which did not have the same block id as 1 
decommission block, now we have 12 internal block(11 live block and 1 
decommission block)
4. after wait for about 600s (before the heart beat come) commission the 
decommissioned DN again, now we have 12 internal block(11 live block and 1 
duplicate block)
{quote}
I think we should reconstruct the "1 decommission block" without the 
over-hardlimit srcNode, so I keep a doubt about this fix.

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Xieming Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14838:
--
Attachment: HDFS-14838.patch
Labels: RBF ui  (was: )
Status: Patch Available  (was: Open)

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
>  Labels: ui, RBF
> Attachments: HDFS-14838.patch, router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14811) RBF: TestRouterRpc#testErasureCoding is flaky

2019-09-10 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927187#comment-16927187
 ] 

Chen Zhang commented on HDFS-14811:
---

[~ayushtkn] [~elgoiri], any comments? Thanks

> RBF: TestRouterRpc#testErasureCoding is flaky
> -
>
> Key: HDFS-14811
> URL: https://issues.apache.org/jira/browse/HDFS-14811
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14811.001.patch, HDFS-14811.002.patch
>
>
> The Failed reason:
> {code:java}
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(838)) - [
> Node /default-rack/127.0.0.1:53148 [
> ]
> Node /default-rack/127.0.0.1:53161 [
> ]
> Node /default-rack/127.0.0.1:53157 [
>   Datanode 127.0.0.1:53157 is not chosen since the node is too busy (load: 3 
> > 2.6665).
> Node /default-rack/127.0.0.1:53143 [
> ]
> Node /default-rack/127.0.0.1:53165 [
> ]
> 2019-09-01 18:19:20,940 [IPC Server handler 5 on default port 53140] INFO  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseRandom(846)) - Not enough replicas 
> was chosen. Reason: {NODE_TOO_BUSY=1}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) 
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> protocol.BlockStoragePolicy (BlockStoragePolicy.java:chooseStorageTypes(161)) 
> - Failed to place enough replicas: expected size is 1 but only 0 storage 
> types can be selected (replication=6, selected=[], unavailable=[DISK], 
> removed=[DISK], policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] WARN  
> blockmanagement.BlockPlacementPolicy 
> (BlockPlacementPolicyDefault.java:chooseTarget(449)) - Failed to place enough 
> replicas, still in need of 1 to reach 6 (unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) All 
> required storage types are unavailable:  unavailableStorages=[DISK], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> 2019-09-01 18:19:20,941 [IPC Server handler 5 on default port 53140] INFO  
> ipc.Server (Server.java:logException(2982)) - IPC Server handler 5 on default 
> port 53140, call Call#1270 Retry#0 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 127.0.0.1:53202
> java.io.IOException: File /testec/testfile2 could only be written to 5 of the 
> 6 required nodes for RS-6-3-1024k. There are 6 datanode(s) running and 6 
> node(s) are excluded in this operation.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2815)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:893)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
> 2019-09-01 18:19:20,942 [IPC Server handler 6 on default port 53197] INFO  
> ipc.Server (Server.java:logException(2975)) - IPC Server handler 6 on default 
> port 53197, call Call#1268 Retry#0 
>

[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed

2019-09-10 Thread angerszhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927184#comment-16927184
 ] 

angerszhu commented on HDFS-10943:
--

[~swingcong]

See my explain, https://issues.apache.org/jira/browse/HDFS-14437

 

[~hexiaoqiao] 

Seem the same reason. 

 

> rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
> --
>
> Key: HDFS-10943
> URL: https://issues.apache.org/jira/browse/HDFS-10943
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Major
>
> Per the following trace stack:
> {code}
> FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log 
> segment 10562075963, 10562174157 failed for required journal 
> (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, 
> 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid 
> 10562075963))
> java.io.IOException: FSEditStream has 49708 bytes still to be flushed and 
> cannot be closed.
> at 
> org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002)
> at 
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
> at 
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> 2016-09-23 21:40:59,618 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
> QuorumOutputStream starting at txid 10562075963
> {code}
> The exception is from  EditsDoubleBuffer
> {code}
>  public void close() throws IOException {
> Preconditions.checkNotNull(bufCurrent);
> Preconditions.checkNotNull(bufReady);
> int bufSize = bufCurrent.size();
> if (bufSize != 0) {
>   throw new IOException("FSEditStream has " + bufSize
>   + " bytes still to be flushed and cannot be closed.");
> }
> IOUtils.cleanup(null, bufCurrent, bufReady);
> bufCurrent = bufReady = null;
>   }
> {code}
> We can see that FSNamesystem.rollEditLog expects  
> EditsDoubleBuffer.bufCurrent to be empty.
> Edits are recorded via FSEditLog$logSync, which does:
> {code}
>* The data is double-buffered within each edit log implementation so that
>* in-memory writing can occur in parallel with the on-disk writing.
>*
>* Each sync occurs in three steps:
>*   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>*  flag.
>*   2. unsynchronized, it flushes the data to storage
>*   3. synchronized, it resets the flag and notifies anyone waiting on the
>*  sync.
>*
>* The lack of synchronization on step 2 allows other threads to continue
>* to write into the memory buffer while the sync is in progress.
>* Because this step is unsynchronized, actions that need to avoid
>* concurrency with sync() should be synchronized and also call
>* waitForSyncToFinish() before assuming they are running alone.
>*/
> {code}
> We can see that step 2 is

[jira] [Commented] (HDFS-14609) RBF: Security should use common AuthenticationFilter

2019-09-10 Thread Chen Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927185#comment-16927185
 ] 

Chen Zhang commented on HDFS-14609:
---

Ping [~crh] [~eyang]...
Could you help to take a look, thanks!

> RBF: Security should use common AuthenticationFilter
> 
>
> Key: HDFS-14609
> URL: https://issues.apache.org/jira/browse/HDFS-14609
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: CR Hota
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14609.001.patch, HDFS-14609.002.patch, 
> HDFS-14609.003.patch, HDFS-14609.004.patch, HDFS-14609.005.patch
>
>
> We worked on router based federation security as part of HDFS-13532. We kept 
> it compatible with the way namenode works. However with HADOOP-16314 and 
> HDFS-16354 in trunk, auth filters seems to have been changed causing tests to 
> fail.
> Changes are needed appropriately in RBF, mainly fixing broken tests.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14768) In some cases, erasure blocks are corruption when they are reconstruct.

2019-09-10 Thread HuangTao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927183#comment-16927183
 ] 

HuangTao commented on HDFS-14768:
-

[~zhaoyim] you can change the `incrementPendingReplicationWithoutTargets()` 
access from default to public 

> In some cases, erasure blocks are corruption  when they are reconstruct.
> 
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: HDFS-14768.000.patch
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   // 
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   for (int i = 0; i < 100; i++) {
> datanodeDescriptor.incrementPendingReplicationWithoutTargets();
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   //assertNull(checkFile(dfs, ecFile, 9, decommisionNodes, numDNs));
>   // Ensure decommissioned datanode is not automatically shutdown
>   DFSClient client = getDfsClient(cluster.getNameNode(0), conf);
>   assertEquals("All datanodes must be alive", numDNs,
>   client.datanodeReport(DatanodeReportType.LIVE).length);
>   FileChecksum fileChecksum2 = dfs.getFileChecksum(ecFile, writeBytes);
>   Assert.assertTrue("Checksum mismatches!",
>   fileChecksum1.equals(fileChecksum2));
>   StripedFileTestUtil.checkData(dfs, ecFile, writeBytes, decommisionNodes,
>   null, blockGroupSize);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Commented] (HDFS-10943) rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed

2019-09-10 Thread wangcong (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927181#comment-16927181
 ] 

wangcong commented on HDFS-10943:
-

Sorry,[~hexiaoqiao],the version we use is 2.6.0-cdh5.10.0.

Through looking log of HDFS-11292,we found the problem as follow:

If roll edit run normally, the log shows :

2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
logSyncAll toSyncToTxId=5060982534 lastSyncedTxid=5060982511 
mostRecentTxid=5060982534
2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Done logSyncAll lastWrittenTxId=5060982534 lastSyncedTxid=5060982534 
mostRecentTxid=5060982534

toSyncToTxId in the firstline is the txId of EndLogSegmentOp,which is the last 
log of editlog,is equal to lastWrittenTxId in the secondline. This shows after 
EndLogSegmentOp,there is nothing to write to double buffer.

but if roll edit run abnormally,the log shows :

2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
logSyncAll toSyncToTxId=5061382825 lastSyncedTxid=5061371306 
mostRecentTxid=5061382825
2019-09-10 03:48:10,273 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Done logSyncAll lastWrittenTxId=5061382841 lastSyncedTxid=5061382840 
mostRecentTxid=5061382841

toSyncToTxId in the firstline is not equal to lastWrittenTxId in the 
secondline,this shows after EndLogSegmentOp,another handler writer log to 
double buffer.

In the secondline,lastWrittenTxId is not equal to lastSyncedTxid, which shows 
current buf is not empty in double buffer.

 

> rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
> --
>
> Key: HDFS-10943
> URL: https://issues.apache.org/jira/browse/HDFS-10943
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Major
>
> Per the following trace stack:
> {code}
> FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log 
> segment 10562075963, 10562174157 failed for required journal 
> (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, 
> 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid 
> 10562075963))
> java.io.IOException: FSEditStream has 49708 bytes still to be flushed and 
> cannot be closed.
> at 
> org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002)
> at 
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
> at 
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> 2016-09-23 21:40:59,618 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
> QuorumOutputStream starting at txid 10562075963
> {code}
> The exception is from  EditsDoubleBuffer
> {code}
>  public void close() throws IOException {
> Preconditions.checkNotNull(bufCurrent);
> Preconditions.checkNotNull(bufReady);
> int bufSize = bufCurrent.size();
> if

[jira] [Comment Edited] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-10 Thread HuangTao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927177#comment-16927177
 ] 

HuangTao edited comment on HDFS-14699 at 9/11/19 1:40 AM:
--

{quote}3. then shutdown one DN which did not have the same block id as 1 
decommission block, now we have 12 internal block(11 live block and 1 
decommission block)
4. after wait for about 600s (before the heart beat come) commission the 
decommissioned DN again, now we have 12 internal block(11 live block and 1 
duplicate block)
{quote}
I think we should reconstruct the "1 decommission block" without the 
over-hardlimit srcNode, so I keep a doubt about this fix.


was (Author: marvelrock):
I have  a doubt about "after wait for about 600s (before the heart beat come) 
commission the decommissioned DN again, now we have 12 internal block(11 live 
block and 1 duplicate block)"

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14835) RBF: Secured Router should not run when it can't initialize DelegationTokenSecretManager

2019-09-10 Thread Takanobu Asanuma (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-14835:

Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged the PR to trunk. Thanks again for your reviews, [~crh] and [~elgoiri].

> RBF: Secured Router should not run when it can't initialize 
> DelegationTokenSecretManager
> 
>
> Key: HDFS-14835
> URL: https://issues.apache.org/jira/browse/HDFS-14835
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: RBF
> Fix For: 3.3.0
>
>
> Currently, even if secured router fails to create 
> DelegationTokenSecretManager, it can start and keeps running. This router 
> can't handle requests with delegation tokens.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.federation.router.FederationUtil: Could 
> not instantiate: ZKDelegationTokenSecretManagerImpl
> ...
> INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 
> {noformat}
> In this case, I think router should not start.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14835) RBF: Secured Router should not run when it can't initialize DelegationTokenSecretManager

2019-09-10 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927178#comment-16927178
 ] 

Hudson commented on HDFS-14835:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17272 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17272/])
HDFS-14835. RBF: Secured Router should not run when it can't initialize 
(github: rev 524b553a5f1c10bf41c723302cf42b592ffa1631)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/TestRouterSecurityManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/RouterSecurityManager.java


> RBF: Secured Router should not run when it can't initialize 
> DelegationTokenSecretManager
> 
>
> Key: HDFS-14835
> URL: https://issues.apache.org/jira/browse/HDFS-14835
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: RBF
>
> Currently, even if secured router fails to create 
> DelegationTokenSecretManager, it can start and keeps running. This router 
> can't handle requests with delegation tokens.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.federation.router.FederationUtil: Could 
> not instantiate: ZKDelegationTokenSecretManagerImpl
> ...
> INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 
> {noformat}
> In this case, I think router should not start.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-10 Thread HuangTao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927177#comment-16927177
 ] 

HuangTao commented on HDFS-14699:
-

I have  a doubt about "after wait for about 600s (before the heart beat come) 
commission the decommissioned DN again, now we have 12 internal block(11 live 
block and 1 duplicate block)"

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310242=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310242
 ]

ASF GitHub Bot logged work on HDDS-2107:


Author: ASF GitHub Bot
Created on: 11/Sep/19 01:28
Start Date: 11/Sep/19 01:28
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on issue #1424: HDDS-2107. 
Datanodes should retry forever to connect to SCM in an…
URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530181099
 
 
   @xiaoyuyao @hanishakoneru @anuengineer @elek @bharatviswa504 Please review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310242)
Time Spent: 0.5h  (was: 20m)

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at scm:9861 for past 0 seconds.
> java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>   at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   ... 13 more
> {code}
> The datanodes should try forever to connect with SCM and not fail immediately 
> after 10 retries.



--
This message was sent by

[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310240
 ]

ASF GitHub Bot logged work on HDDS-2107:


Author: ASF GitHub Bot
Created on: 11/Sep/19 01:28
Start Date: 11/Sep/19 01:28
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on issue #1424: HDDS-2107. 
Datanodes should retry forever to connect to SCM in an…
URL: https://github.com/apache/hadoop/pull/1424#issuecomment-530180967
 
 
   /label ozone
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310240)
Time Spent: 20m  (was: 10m)

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at scm:9861 for past 0 seconds.
> java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>   at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   ... 13 more
> {code}
> The datanodes should try forever to connect with SCM and not fail immediately 
> after 10 retries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2107:
-
Labels: pull-request-available  (was: )

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at scm:9861 for past 0 seconds.
> java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>   at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   ... 13 more
> {code}
> The datanodes should try forever to connect with SCM and not fail immediately 
> after 10 retries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?focusedWorklogId=310239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310239
 ]

ASF GitHub Bot logged work on HDDS-2107:


Author: ASF GitHub Bot
Created on: 11/Sep/19 01:27
Start Date: 11/Sep/19 01:27
Worklog Time Spent: 10m 
  Work Description: vivekratnavel commented on pull request #1424: 
HDDS-2107. Datanodes should retry forever to connect to SCM in an…
URL: https://github.com/apache/hadoop/pull/1424
 
 
   … unsecure environment
   
In an unsecure environment, the datanodes try upto 10 times after waiting 
for 1000 milliseconds each time before throwing this error:
   
   ```Unable to communicate to SCM server at scm:9861 for past 0 seconds.
   java.net.ConnectException: Call From scm:9861 failed on connection 
exception: java.net.ConnectException: Connection refused;```
   
   This PR fixes that issue by having datanodes try forever to connect with SCM 
and not fail immediately after 10 retries.
   
   I have also increased timeouts on a unit test to improve its stability.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310239)
Remaining Estimate: 0h
Time Spent: 10m

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at scm:9861 for past 0 seconds.
> java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
>

[jira] [Updated] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread Vivek Ratnavel Subramanian (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vivek Ratnavel Subramanian updated HDDS-2107:
-
Description: 
In an unsecure environment, the datanodes try upto 10 times after waiting for 
1000 milliseconds each time before throwing this error:
{code:java}
Unable to communicate to SCM server at scm:9861 for past 0 seconds.
java.net.ConnectException: Call From scm/10.65.36.118 to scm:9861 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
at 
org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
... 13 more
{code}
The datanodes should try forever to connect with SCM and not fail immediately 
after 10 retries.

  was:
In an unsecure environment, the datanodes try upto 10 times after waiting for 
1000 milliseconds each time before throwing this error:
{code:java}
Unable to communicate to SCM server at 
jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds.
java.net.ConnectException: Call From 
jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to 
jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection 
exception: java.net.ConnectException: Connection refused; For more details see: 
 http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
at

[jira] [Updated] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14843:
--
Status: Patch Available  (was: Open)

> Double Synchronization in BlockReportLeaseManager
> -
>
> Key: HDFS-14843
> URL: https://issues.apache.org/jira/browse/HDFS-14843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14843.1.patch
>
>
> {code:java|title=BlockReportLeaseManager.java}
>   private synchronized long getNextId() {
> long id;
> do {
>   id = nextId++;
> } while (id == 0);
> return id;
>   }
> {code}
> This is a private method and is synchronized, however, it is only be accessed 
> from an already-synchronized method.  No need to double-synchronize.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14843:
--
Attachment: HDFS-14843.1.patch

> Double Synchronization in BlockReportLeaseManager
> -
>
> Key: HDFS-14843
> URL: https://issues.apache.org/jira/browse/HDFS-14843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14843.1.patch
>
>
> {code:java|title=BlockReportLeaseManager.java}
>   private synchronized long getNextId() {
> long id;
> do {
>   id = nextId++;
> } while (id == 0);
> return id;
>   }
> {code}
> This is a private method and is synchronized, however, it is only be accessed 
> from an already-synchronized method.  No need to double-synchronize.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HDFS-14843:
-

Assignee: David Mollitor

> Double Synchronization in BlockReportLeaseManager
> -
>
> Key: HDFS-14843
> URL: https://issues.apache.org/jira/browse/HDFS-14843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> {code:java|title=BlockReportLeaseManager.java}
>   private synchronized long getNextId() {
> long id;
> do {
>   id = nextId++;
> } while (id == 0);
> return id;
>   }
> {code}
> This is a private method and is synchronized, however, it is only be accessed 
> from an already-synchronized method.  No need to double-synchronize.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14843) Double Synchronization in BlockReportLeaseManager

2019-09-10 Thread David Mollitor (Jira)

David Mollitor created HDFS-14843:
-

 Summary: Double Synchronization in BlockReportLeaseManager
 Key: HDFS-14843
 URL: https://issues.apache.org/jira/browse/HDFS-14843
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: David Mollitor


{code:java|title=BlockReportLeaseManager.java}
  private synchronized long getNextId() {
long id;
do {
  id = nextId++;
} while (id == 0);
return id;
  }
{code}

This is a private method and is synchronized, however, it is only be accessed 
from an already-synchronized method.  No need to double-synchronize.

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L183-L189

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReportLeaseManager.java#L227



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14842) ByteArrayManager Reduce Synchronization

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14842:
--
Status: Patch Available  (was: Open)

> ByteArrayManager Reduce Synchronization
> ---
>
> Key: HDFS-14842
> URL: https://issues.apache.org/jira/browse/HDFS-14842
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14842.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14842) ByteArrayManager Reduce Synchronization

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14842:
--
Attachment: HDFS-14842.1.patch

> ByteArrayManager Reduce Synchronization
> ---
>
> Key: HDFS-14842
> URL: https://issues.apache.org/jira/browse/HDFS-14842
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14842.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Xieming Li (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927168#comment-16927168
 ] 

Xieming Li commented on HDFS-14838:
---

Hi, [~elgoiri], Thank you for the confirmation, I will submit a patch soon.

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
> Attachments: router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14842) ByteArrayManager Reduce Synchronization

2019-09-10 Thread David Mollitor (Jira)

David Mollitor created HDFS-14842:
-

 Summary: ByteArrayManager Reduce Synchronization
 Key: HDFS-14842
 URL: https://issues.apache.org/jira/browse/HDFS-14842
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927144#comment-16927144
 ] 

Hadoop QA commented on HDFS-14840:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}199m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.0 Server=19.03.0 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14840 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979993/HDFS-14840.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0761209e5eb3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 10144a5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27835/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27835/testReport/ |
| Max. process+thread count | 2626 (vs. ulimit of 5500) |
| modules | C:

[jira] [Commented] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927138#comment-16927138
 ] 

Hadoop QA commented on HDFS-14841:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 20m 
31s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 7 unchanged - 4 fixed = 9 total (was 11) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14841 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/1297/HDFS-14841.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 49c750562be7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 10144a5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27836/artifact/out/branch-mvninstall-root.txt
 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27836/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit |

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310172
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 23:20
Start Date: 10/Sep/19 23:20
Worklog Time Spent: 10m 
  Work Description: smengcl commented on issue #1360: HDDS-2007. Make ozone 
fs shell command work with OM HA service ids
URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530156703
 
 
   @bharatviswa504 The previous commit passed all acceptance and unit. The 
latest commit shouldn't cause those failures. I'll trigger a retest to check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310172)
Time Spent: 5h 20m  (was: 5h 10m)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310173
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 23:20
Start Date: 10/Sep/19 23:20
Worklog Time Spent: 10m 
  Work Description: smengcl commented on issue #1360: HDDS-2007. Make ozone 
fs shell command work with OM HA service ids
URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530156719
 
 
   /retest
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310173)
Time Spent: 5.5h  (was: 5h 20m)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2019-09-10 Thread Siyao Meng (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927107#comment-16927107
 ] 

Siyao Meng commented on HDFS-14283:
---

[~leosun08] Thanks!

> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927106#comment-16927106
 ] 

Íñigo Goiri commented on HDFS-14837:


Can we use EqualsBuilder and HashCodeBuilder?
I would also complete the javadocs instead of removing them.
For example for getBlockName() I would put an example of blk_X.
I always want to know which one is the one with the timestamp or the one 
without; the javadoc would clarify that.

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927104#comment-16927104
 ] 

Íñigo Goiri commented on HDFS-14795:


Let's fix the checkstyle and make the new methods static.

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDDS-1873) Recon should store last successful run timestamp for each task

2019-09-10 Thread Siddharth Wagle (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle reassigned HDDS-1873:
-

Assignee: Shweta  (was: Aravindan Vijayan)

> Recon should store last successful run timestamp for each task
> --
>
> Key: HDDS-1873
> URL: https://issues.apache.org/jira/browse/HDDS-1873
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Shweta
>Priority: Major
>
> Recon should store last ozone manager snapshot received timestamp along with 
> timestamps of last successful run for each task.
> This is important to give users a sense of how latest the current data that 
> they are looking at is. And, we need this per task because some tasks might 
> fail to run or might take much longer time to run than other tasks and this 
> needs to be reflected in the UI for better and consistent user experience.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-6524) Choosing datanode retries times considering with block replica number

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927103#comment-16927103
 ] 

Íñigo Goiri commented on HDFS-6524:
---

Is there any unit test we can add to cover the new case?
BTW, that if is getting a little unreadable.

> Choosing datanode  retries times considering with block replica number
> --
>
> Key: HDFS-6524
> URL: https://issues.apache.org/jira/browse/HDFS-6524
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha1
>Reporter: Liang Xie
>Assignee: Lisheng Sun
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6524.001.patch, HDFS-6524.002.patch, HDFS-6524.txt
>
>
> Currently the chooseDataNode() does retry with the setting: 
> dfsClientConf.maxBlockAcquireFailures, which by default is 3 
> (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better 
> having another option, block replication factor. One cluster with only  two 
> block replica setting, or using Reed-solomon encoding solution with one 
> replica factor. It helps to reduce the long tail latency.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927101#comment-16927101
 ] 

Íñigo Goiri commented on HDFS-14839:


Yetus does not look very happy.

> Use Java Concurrent BlockingQueue instead of Internal BlockQueue
> 
>
> Key: HDFS-14839
> URL: https://issues.apache.org/jira/browse/HDFS-14839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14839.1.patch
>
>
> Replace...
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86
> With...
> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927099#comment-16927099
 ] 

Hadoop QA commented on HDFS-14839:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 25 unchanged - 6 fixed = 25 total (was 31) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
43s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 3 new + 0 
unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}128m 39s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}202m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Exceptional return value of 
java.util.concurrent.BlockingQueue.offer(Object) ignored in 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeErasureCoded(ExtendedBlock,
 DatanodeDescriptor[], DatanodeStorageInfo[], byte[], ErasureCodingPolicy)  At 
DatanodeDescriptor.java:ignored in 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeErasureCoded(ExtendedBlock,
 DatanodeDescriptor[], DatanodeStorageInfo[], byte[], ErasureCodingPolicy)  At 
DatanodeDescriptor.java:[line 624] |
|  |  Exceptional return value of 
java.util.concurrent.BlockingQueue.offer(Object) ignored in 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeRecovered(BlockInfo)
  At DatanodeDescriptor.java:ignored in 
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeDescriptor.addBlockToBeRecovered(BlockInfo)
  At DatanodeDescriptor.java:[line 638] |
|  |  Exceptional return value of

[jira] [Commented] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927098#comment-16927098
 ] 

Íñigo Goiri commented on HDFS-14841:


Do you have any performance result or pointer?
It makes sense but it's better to have it for completion.

Can we avoid the parenthesis around key? Not sure if lambdas makes it necessary.

> Remove Class-Level Synchronization in LocalReplica
> --
>
> Key: HDFS-14841
> URL: https://issues.apache.org/jira/browse/HDFS-14841
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14841.1.patch
>
>
> Use Java's Concurrent package instead.
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927089#comment-16927089
 ] 

Íñigo Goiri commented on HDFS-14090:


Right now, the test is doing a catch of the exception and then calling fail() 
which triggers an assert exception.
We can just let the exception surface without the fail.
Finally, when doing the get in the future we can catch the exception and expose 
the actual exception.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927084#comment-16927084
 ] 

Íñigo Goiri commented on HDFS-14838:


Thanks [~risyomei], it makes sense to make it consistent with the NN and the DN.
Do you mind submitting a patch?

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
> Attachments: router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14838) RBF: Display RPC (instead of HTTP) Port Number in RBF web UI

2019-09-10 Thread Jira



 [ 
https://issues.apache.org/jira/browse/HDFS-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14838:
---
Summary: RBF: Display RPC (instead of HTTP) Port Number in RBF web UI  
(was: Display RPC (instead of HTTP) Port Number in RBF web UI.)

> RBF: Display RPC (instead of HTTP) Port Number in RBF web UI
> 
>
> Key: HDFS-14838
> URL: https://issues.apache.org/jira/browse/HDFS-14838
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.1.2
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Minor
> Attachments: router-ui.jpg
>
>
> Currently The WebUI of RBF is using : in its heading.
> It should be changed to : as the WebUI of NameNode and 
> DataNode do.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work started] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread Vivek Ratnavel Subramanian (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2107 started by Vivek Ratnavel Subramanian.

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at 
> jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds.
> java.net.ConnectException: Call From 
> jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to 
> jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>   at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   ... 13 more
> {code}
> The datanodes should try forever to connect with SCM and not fail immediately 
> after 10 retries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2089) Add CLI createPipeline

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2089?focusedWorklogId=310155=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310155
 ]

ASF GitHub Bot logged work on HDDS-2089:


Author: ASF GitHub Bot
Created on: 10/Sep/19 22:20
Start Date: 10/Sep/19 22:20
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #1418: HDDS-2089: 
Add createPipeline CLI.
URL: https://github.com/apache/hadoop/pull/1418#discussion_r322987593
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java
 ##
 @@ -390,10 +390,9 @@ public void 
notifyObjectStageChange(StorageContainerLocationProtocolProtos
   public Pipeline createReplicationPipeline(HddsProtos.ReplicationType type,
   HddsProtos.ReplicationFactor factor, HddsProtos.NodePool nodePool)
   throws IOException {
-// TODO: will be addressed in future patch.
-// This is needed only for debugging purposes to make sure cluster is
-// working correctly.
-return null;
+AUDIT.logReadSuccess(
 
 Review comment:
   Should we log this as a write success for pipeline creation.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310155)
Time Spent: 0.5h  (was: 20m)

> Add CLI createPipeline
> --
>
> Key: HDDS-2089
> URL: https://issues.apache.org/jira/browse/HDDS-2089
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone CLI
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a SCMCLI to create pipeline for ozone.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310154
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 22:16
Start Date: 10/Sep/19 22:16
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on issue #1360: HDDS-2007. 
Make ozone fs shell command work with OM HA service ids 
URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530141859
 
 
   +1. Can you check if the acceptance tests failed are related?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310154)
Time Spent: 5h 10m  (was: 5h)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1879) Support multiple excluded scopes when choosing datanodes in NetworkTopology

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1879?focusedWorklogId=310153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310153
 ]

ASF GitHub Bot logged work on HDDS-1879:


Author: ASF GitHub Bot
Created on: 10/Sep/19 22:14
Start Date: 10/Sep/19 22:14
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on issue #1194: HDDS-1879.  Support 
multiple excluded scopes when choosing datanodes in NetworkTopology
URL: https://github.com/apache/hadoop/pull/1194#issuecomment-530141215
 
 
   Thanks @ChenSammi  for updating the PR. The latest change LGTM. Can you fix 
the checkstyle and unit test failure that seems to be related? +1 after that. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310153)
Time Spent: 3h 50m  (was: 3h 40m)

> Support multiple excluded scopes when choosing datanodes in NetworkTopology
> ---
>
> Key: HDDS-1879
> URL: https://issues.apache.org/jira/browse/HDDS-1879
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14841:
--
Attachment: HDFS-14841.1.patch

> Remove Class-Level Synchronization in LocalReplica
> --
>
> Key: HDFS-14841
> URL: https://issues.apache.org/jira/browse/HDFS-14841
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14841.1.patch
>
>
> Use Java's Concurrent package instead.
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14841:
--
Status: Patch Available  (was: Open)

> Remove Class-Level Synchronization in LocalReplica
> --
>
> Key: HDFS-14841
> URL: https://issues.apache.org/jira/browse/HDFS-14841
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14841.1.patch
>
>
> Use Java's Concurrent package instead.
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14841) Remove Class-Level Synchronization in LocalReplica

2019-09-10 Thread David Mollitor (Jira)

David Mollitor created HDFS-14841:
-

 Summary: Remove Class-Level Synchronization in LocalReplica
 Key: HDFS-14841
 URL: https://issues.apache.org/jira/browse/HDFS-14841
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


Use Java's Concurrent package instead.

https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/LocalReplica.java#L143



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work started] (HDDS-2041) Don't depend on DFSUtil to check HTTP policy

2019-09-10 Thread Vivek Ratnavel Subramanian (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2041 started by Vivek Ratnavel Subramanian.

> Don't depend on DFSUtil to check HTTP policy
> 
>
> Key: HDDS-2041
> URL: https://issues.apache.org/jira/browse/HDDS-2041
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: website
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> Currently, BaseHttpServer uses DFSUtil to get Http policy. With this, when 
> http policy is set to HTTPS on hdfs-site.xml, ozone http servers try to come 
> up with HTTPS and fail if SSL certificates are not present in the required 
> location.
> Ozone web UIs should not depend on HDFS config to determine HTTP policy. 
> Instead, it should have its own config to determine the policy. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread Vivek Ratnavel Subramanian (Jira)



[ 
https://issues.apache.org/jira/browse/HDDS-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927049#comment-16927049
 ] 

Vivek Ratnavel Subramanian commented on HDDS-2107:
--

cc [~xyao]

> Datanodes should retry forever to connect to SCM in an unsecure environment
> ---
>
> Key: HDDS-2107
> URL: https://issues.apache.org/jira/browse/HDDS-2107
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> In an unsecure environment, the datanodes try upto 10 times after waiting for 
> 1000 milliseconds each time before throwing this error:
> {code:java}
> Unable to communicate to SCM server at 
> jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds.
> java.net.ConnectException: Call From 
> jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to 
> jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
>   at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
>   at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>   ... 13 more
> {code}
> The datanodes should try forever to connect with SCM and not fail immediately 
> after 10 retries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-2107) Datanodes should retry forever to connect to SCM in an unsecure environment

2019-09-10 Thread Vivek Ratnavel Subramanian (Jira)

Vivek Ratnavel Subramanian created HDDS-2107:


 Summary: Datanodes should retry forever to connect to SCM in an 
unsecure environment
 Key: HDDS-2107
 URL: https://issues.apache.org/jira/browse/HDDS-2107
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.4.1
Reporter: Vivek Ratnavel Subramanian
Assignee: Vivek Ratnavel Subramanian


In an unsecure environment, the datanodes try upto 10 times after waiting for 
1000 milliseconds each time before throwing this error:
{code:java}
Unable to communicate to SCM server at 
jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 for past 0 seconds.
java.net.ConnectException: Call From 
jmccarthy-ozone-unsecure2-4.vpc.cloudera.com/10.65.36.118 to 
jmccarthy-ozone-unsecure2-2.vpc.cloudera.com:9861 failed on connection 
exception: java.net.ConnectException: Connection refused; For more details see: 
 http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy33.getVersion(Unknown Source)
at 
org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:112)
at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:70)
at 
org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:690)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:794)
at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
... 13 more
{code}
The datanodes should try forever to connect with SCM and not fail immediately 
after 10 retries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2106:
-
Labels: pull-request-available  (was: )

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-10 Thread Elek, Marton (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-2106:
---
Status: Patch Available  (was: Open)

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2106) Avoid usage of hadoop projects as parent of hdds/ozone

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2106?focusedWorklogId=310124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310124
 ]

ASF GitHub Bot logged work on HDDS-2106:


Author: ASF GitHub Bot
Created on: 10/Sep/19 21:15
Start Date: 10/Sep/19 21:15
Worklog Time Spent: 10m 
  Work Description: elek commented on pull request #1423: HDDS-2106. Avoid 
usage of hadoop projects as parent of hdds/ozone
URL: https://github.com/apache/hadoop/pull/1423
 
 
   Ozone uses hadoop as a dependency. The dependency defined on multiple level:
   
1. the hadoop artifacts are defined in the  sections
2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
parent
   
   As we already have a slightly different assembly process it could be more 
resilient to use a dedicated parent project instead of the hadoop one. With 
this approach it will be easier to upgrade the versions as we don't need to be 
careful about the pom contents only about the used dependencies.
   
   See: https://issues.apache.org/jira/browse/HDDS-2106
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310124)
Remaining Estimate: 0h
Time Spent: 10m

> Avoid usage of hadoop projects as parent of hdds/ozone
> --
>
> Key: HDDS-2106
> URL: https://issues.apache.org/jira/browse/HDDS-2106
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Ozone uses hadoop as a dependency. The dependency defined on multiple level:
>  1. the hadoop artifacts are defined in the  sections
>  2. both hadoop-ozone and hadoop-hdds projects uses "hadoop-project" as the 
> parent
> As we already have a slightly different assembly process it could be more 
> resilient to use a dedicated parent project instead of the hadoop one. With 
> this approach it will be easier to upgrade the versions as we don't need to 
> be careful about the pom contents only about the used dependencies.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14840:
--
Attachment: HDFS-14840.1.patch

> Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
> -
>
> Key: HDFS-14840
> URL: https://issues.apache.org/jira/browse/HDFS-14840
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14840.1.patch
>
>
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40
> Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14840:
--
Status: Patch Available  (was: Open)

> Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager
> -
>
> Key: HDFS-14840
> URL: https://issues.apache.org/jira/browse/HDFS-14840
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14840.1.patch
>
>
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40
> Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14840) Use Java Conccurent Instead of Synchrnoization in BlockPoolTokenSecretManager

2019-09-10 Thread David Mollitor (Jira)

David Mollitor created HDFS-14840:
-

 Summary: Use Java Conccurent Instead of Synchrnoization in 
BlockPoolTokenSecretManager
 Key: HDFS-14840
 URL: https://issues.apache.org/jira/browse/HDFS-14840
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockPoolTokenSecretManager.java#L40

Instead of synchronizing the entire class, just synchronize the collection.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310114
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 20:56
Start Date: 10/Sep/19 20:56
Worklog Time Spent: 10m 
  Work Description: smengcl commented on issue #1360: HDDS-2007. Make ozone 
fs shell command work with OM HA service ids
URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530115882
 
 
   /retest
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310114)
Time Spent: 5h  (was: 4h 50m)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310112
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 20:55
Start Date: 10/Sep/19 20:55
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #1360: HDDS-2007. 
Make ozone fs shell command work with OM HA service ids 
URL: https://github.com/apache/hadoop/pull/1360#discussion_r322957816
 
 

 ##
 File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
 ##
 @@ -70,26 +71,46 @@
   private final UserGroupInformation ugi;
   private final Text delegationTokenService;
 
+  // TODO: Do we want this to be final?
+  private String omServiceId;
+
   public OMFailoverProxyProvider(OzoneConfiguration configuration,
-  UserGroupInformation ugi) throws IOException {
+  UserGroupInformation ugi, String omServiceId) throws IOException {
 this.conf = configuration;
 this.omVersion = RPC.getProtocolVersion(OzoneManagerProtocolPB.class);
 this.ugi = ugi;
-loadOMClientConfigs(conf);
+this.omServiceId = omServiceId;
+loadOMClientConfigs(conf, this.omServiceId);
 this.delegationTokenService = computeDelegationTokenService();
 
 currentProxyIndex = 0;
 currentProxyOMNodeId = omNodeIDList.get(currentProxyIndex);
   }
 
-  private void loadOMClientConfigs(Configuration config) throws IOException {
+  public OMFailoverProxyProvider(OzoneConfiguration configuration,
+  UserGroupInformation ugi) throws IOException {
+this(configuration, ugi, null);
+  }
+
+  private void loadOMClientConfigs(Configuration config, String omSvcId)
+  throws IOException {
 this.omProxies = new HashMap<>();
 this.omProxyInfos = new HashMap<>();
 this.omNodeIDList = new ArrayList<>();
 
-Collection omServiceIds = config.getTrimmedStringCollection(
-OZONE_OM_SERVICE_IDS_KEY);
+Collection omServiceIds;
+if (omSvcId == null) {
 
 Review comment:
   Filed https://issues.apache.org/jira/browse/HDDS-2104
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310112)
Time Spent: 4h 50m  (was: 4h 40m)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310099
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 20:51
Start Date: 10/Sep/19 20:51
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #1360: HDDS-2007. 
Make ozone fs shell command work with OM HA service ids 
URL: https://github.com/apache/hadoop/pull/1360#discussion_r322955933
 
 

 ##
 File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java
 ##
 @@ -131,6 +142,13 @@ public void initialize(URI name, Configuration conf) 
throws IOException {
 // If port number is not specified, read it from config
 omPort = OmUtils.getOmRpcPort(conf);
   }
+} else if (OmUtils.isServiceIdsDefined(conf)) {
 
 Review comment:
   Make sense. Thanks! Fixing this in an upcoming commit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310099)
Time Spent: 4h 40m  (was: 4.5h)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310098
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 20:50
Start Date: 10/Sep/19 20:50
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #1360: HDDS-2007. 
Make ozone fs shell command work with OM HA service ids 
URL: https://github.com/apache/hadoop/pull/1360#discussion_r322955607
 
 

 ##
 File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneClientFactory.java
 ##
 @@ -136,6 +136,31 @@ public static OzoneClient getRpcClient(String omHost, 
Integer omRpcPort,
 return getRpcClient(config);
   }
 
+  /**
+   * Returns an OzoneClient which will use RPC protocol.
+   *
+   * @param omServiceId
+   *Service ID of OzoneManager HA cluster.
+   *
+   * @param config
+   *Configuration to be used for OzoneClient creation
+   *
+   * @return OzoneClient
+   *
+   * @throws IOException
+   */
+  public static OzoneClient getRpcClient(String omServiceId,
 
 Review comment:
   Filed https://issues.apache.org/jira/browse/HDDS-2105
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310098)
Time Spent: 4.5h  (was: 4h 20m)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-09-10 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926881#comment-16926881
 ] 

Ayush Saxena commented on HDFS-14528:
-

On a cursory view, the patch looks good.

minor nit:
correct the javadoc 

{code:java}
+  /**
+   * Test that manual failover is successful with Observernode.
+   */
{code}
In the test you are not  using observer, but standby...


> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310067
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 10/Sep/19 19:54
Start Date: 10/Sep/19 19:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530093799
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 38 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 26 | Maven dependency ordering for branch |
   | +1 | mvninstall | 573 | trunk passed |
   | +1 | compile | 398 | trunk passed |
   | +1 | checkstyle | 78 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 874 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 178 | trunk passed |
   | 0 | spotbugs | 446 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 653 | trunk passed |
   | -0 | patch | 500 | Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 40 | Maven dependency ordering for patch |
   | +1 | mvninstall | 562 | the patch passed |
   | +1 | compile | 400 | the patch passed |
   | +1 | javac | 400 | the patch passed |
   | +1 | checkstyle | 86 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 686 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 192 | the patch passed |
   | +1 | findbugs | 789 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 314 | hadoop-hdds in the patch passed. |
   | -1 | unit | 3129 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 48 | The patch does not generate ASF License warnings. |
   | | | 9252 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.client.rpc.TestSecureOzoneRpcClient |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerHandler
 |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
   |   | hadoop.ozone.client.rpc.TestDeleteWithSlowFollower |
   |   | hadoop.ozone.TestMiniChaosOzoneCluster |
   |   | hadoop.ozone.om.TestOzoneManagerHA |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.client.rpc.TestCommitWatcher |
   |   | hadoop.ozone.client.rpc.TestContainerStateMachineFailures |
   |   | hadoop.ozone.client.rpc.TestOzoneAtRestEncryption |
   |   | hadoop.ozone.container.TestContainerReplication |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1163 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux f632dd8cccfe 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / dc9abd2 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/testReport/ |
   | Max. process+thread count | 4107 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/9/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310065
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 19:51
Start Date: 10/Sep/19 19:51
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #1360: 
HDDS-2007. Make ozone fs shell command work with OM HA service ids  
URL: https://github.com/apache/hadoop/pull/1360#discussion_r322930998
 
 

 ##
 File path: 
hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java
 ##
 @@ -131,6 +142,13 @@ public void initialize(URI name, Configuration conf) 
throws IOException {
 // If port number is not specified, read it from config
 omPort = OmUtils.getOmRpcPort(conf);
   }
+} else if (OmUtils.isServiceIdsDefined(conf)) {
 
 Review comment:
   
https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneFileSystem.java#L154
 
   This is code line calling creation of BasicOzoneClientAdapterImpl. And below 
is the code where it checks if conf passed is instance of OzoneConfiguration or 
not, if not convert to OzoneConfiguration object.
   
   
https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozonefs/src/main/java/org/apache/hadoop/fs/ozone/BasicOzoneClientAdapterImpl.java#L112
   
   

 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310065)
Time Spent: 4h 20m  (was: 4h 10m)

> Make ozone fs shell command work with OM HA service ids
> ---
>
> Key: HDDS-2007
> URL: https://issues.apache.org/jira/browse/HDDS-2007
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Build an HDFS HA-like nameservice for OM HA so that Ozone client can access 
> Ozone HA cluster with ease.
> The majority of the work is already done in HDDS-972. But the problem is that 
> the client would crash if there are more than one service ids 
> (ozone.om.service.ids) configured in ozone-site.xml. This need to be address 
> on client side.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=310059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310059
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 10/Sep/19 19:41
Start Date: 10/Sep/19 19:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530089343
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 99 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 76 | Maven dependency ordering for branch |
   | +1 | mvninstall | 678 | trunk passed |
   | +1 | compile | 375 | trunk passed |
   | +1 | checkstyle | 74 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 972 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 184 | trunk passed |
   | 0 | spotbugs | 440 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 653 | trunk passed |
   | -0 | patch | 483 | Used diff version of patch file. Binary files and 
potentially other changes not applied. Please rebase and squash commits if 
necessary. |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 30 | Maven dependency ordering for patch |
   | +1 | mvninstall | 546 | the patch passed |
   | +1 | compile | 377 | the patch passed |
   | +1 | javac | 377 | the patch passed |
   | +1 | checkstyle | 77 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 728 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 169 | the patch passed |
   | +1 | findbugs | 657 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 311 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2292 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 43 | The patch does not generate ASF License warnings. |
   | | | 8517 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.scm.node.TestQueryNode |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.container.TestContainerReplication |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStream |
   |   | hadoop.ozone.client.rpc.TestContainerStateMachine |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.0 Server=19.03.0 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1163 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 8d855eed86b8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / dc9abd2 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/testReport/ |
   | Max. process+thread count | 5342 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/container-service hadoop-ozone/integration-test 
U: . |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1163/8/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310059)
Time Spent: 3.5h  (was: 3h 20m)

> Datanodes takeSnapshot should delete previously created snapshots
>

[jira] [Updated] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14839:
--
Attachment: HDFS-14839.1.patch

> Use Java Concurrent BlockingQueue instead of Internal BlockQueue
> 
>
> Key: HDFS-14839
> URL: https://issues.apache.org/jira/browse/HDFS-14839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14839.1.patch
>
>
> Replace...
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86
> With...
> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue

2019-09-10 Thread David Mollitor (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HDFS-14839:
--
Status: Patch Available  (was: Open)

> Use Java Concurrent BlockingQueue instead of Internal BlockQueue
> 
>
> Key: HDFS-14839
> URL: https://issues.apache.org/jira/browse/HDFS-14839
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14839.1.patch
>
>
> Replace...
> https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86
> With...
> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14839) Use Java Concurrent BlockingQueue instead of Internal BlockQueue

2019-09-10 Thread David Mollitor (Jira)

David Mollitor created HDFS-14839:
-

 Summary: Use Java Concurrent BlockingQueue instead of Internal 
BlockQueue
 Key: HDFS-14839
 URL: https://issues.apache.org/jira/browse/HDFS-14839
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


Replace...

https://github.com/apache/hadoop/blob/d8bac50e12d243ef8fd2c7e0ce5c9997131dee74/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L86

With...

https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html





--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310050
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 10/Sep/19 19:20
Start Date: 10/Sep/19 19:20
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1344: HDDS-1982 
Extend SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#discussion_r322918560
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/states/NodeStateMap.java
 ##
 @@ -309,4 +381,61 @@ private void checkIfNodeExist(UUID uuid) throws 
NodeNotFoundException {
   throw new NodeNotFoundException("Node UUID: " + uuid);
 }
   }
+
+  /**
+   * Create a list of datanodeInfo for all nodes matching the passed states.
+   * Passing null for one of the states acts like a wildcard for that state.
+   *
+   * @param opState
+   * @param health
+   * @return List of DatanodeInfo objects matching the passed state
+   */
+  private List filterNodes(
+  NodeOperationalState opState, NodeState health) {
+if (opState != null && health != null) {
 
 Review comment:
   Can we write Line 395-440 with one simple stream().filter? Nothing wrong 
with code itself but just a thought.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310050)
Time Spent: 4h 50m  (was: 4h 40m)

> Extend SCMNodeManager to support decommission and maintenance states
> 
>
> Key: HDDS-1982
> URL: https://issues.apache.org/jira/browse/HDDS-1982
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently, within SCM a node can have the following states:
> HEALTHY
> STALE
> DEAD
> DECOMMISSIONING
> DECOMMISSIONED
> The last 2 are not currently used.
> In order to support decommissioning and maintenance mode, we need to extend 
> the set of states a node can have to include decommission and maintenance 
> states.
> It is also important to note that a node decommissioning or entering 
> maintenance can also be HEALTHY, STALE or go DEAD.
> Therefore in this Jira I propose we should model a node state with two 
> different sets of values. The first, is effectively the liveliness of the 
> node, with the following states. This is largely what is in place now:
> HEALTHY
> STALE
> DEAD
> The second is the node operational state:
> IN_SERVICE
> DECOMMISSIONING
> DECOMMISSIONED
> ENTERING_MAINTENANCE
> IN_MAINTENANCE
> That means the overall total number of states for a node is the cross-product 
> of the two above lists, however it probably makes sense to keep the two 
> states seperate internally.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=310038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310038
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 10/Sep/19 19:02
Start Date: 10/Sep/19 19:02
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #1344: HDDS-1982 
Extend SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#discussion_r322911485
 
 

 ##
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java
 ##
 @@ -417,9 +451,12 @@ private SCMNodeStat getNodeStatInternal(DatanodeDetails 
datanodeDetails) {
 
   @Override
   public Map getNodeCount() {
+// TODO - This does not consider decom, maint etc.
 Map nodeCountMap = new HashMap();
 
 Review comment:
   Why not ? It makes it easier to consume for the caller 
in my opinion.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 310038)
Time Spent: 4h 40m  (was: 4.5h)

> Extend SCMNodeManager to support decommission and maintenance states
> 
>
> Key: HDDS-1982
> URL: https://issues.apache.org/jira/browse/HDDS-1982
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently, within SCM a node can have the following states:
> HEALTHY
> STALE
> DEAD
> DECOMMISSIONING
> DECOMMISSIONED
> The last 2 are not currently used.
> In order to support decommissioning and maintenance mode, we need to extend 
> the set of states a node can have to include decommission and maintenance 
> states.
> It is also important to note that a node decommissioning or entering 
> maintenance can also be HEALTHY, STALE or go DEAD.
> Therefore in this Jira I propose we should model a node state with two 
> different sets of values. The first, is effectively the liveliness of the 
> node, with the following states. This is largely what is in place now:
> HEALTHY
> STALE
> DEAD
> The second is the node operational state:
> IN_SERVICE
> DECOMMISSIONING
> DECOMMISSIONED
> ENTERING_MAINTENANCE
> IN_MAINTENANCE
> That means the overall total number of states for a node is the cross-product 
> of the two above lists, however it probably makes sense to keep the two 
> states seperate internally.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-09-10 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926826#comment-16926826
 ] 

Surendra Singh Lilhore commented on HDFS-14699:
---

I checked this and just adding ">="  will not help, {{StripedReader.java}} need 
source and liveindices list with same size. Lets handle this in HDFS-14768.

+1 for v5

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2007) Make ozone fs shell command work with OM HA service ids

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2007?focusedWorklogId=310012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-310012
 ]

ASF GitHub Bot logged work on HDDS-2007:


Author: ASF GitHub Bot
Created on: 10/Sep/19 18:21
Start Date: 10/Sep/19 18:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1360: HDDS-2007. Make 
ozone fs shell command work with OM HA service ids   
URL: https://github.com/apache/hadoop/pull/1360#issuecomment-530060194
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 5746 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 1 | No case conflicting files found. |
   | 0 | shelldocs | 0 | Shelldocs was not available. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 5 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 31 | Maven dependency ordering for branch |
   | +1 | mvninstall | 718 | trunk passed |
   | +1 | compile | 453 | trunk passed |
   | +1 | checkstyle | 94 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 936 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 197 | trunk passed |
   | 0 | spotbugs | 504 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 750 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 37 | Maven dependency ordering for patch |
   | +1 | mvninstall | 556 | the patch passed |
   | +1 | compile | 414 | the patch passed |
   | +1 | javac | 414 | the patch passed |
   | +1 | checkstyle | 105 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | shellcheck | 0 | There were no new shellcheck issues. |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 702 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 191 | the patch passed |
   | +1 | findbugs | 666 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 291 | hadoop-hdds in the patch passed. |
   | -1 | unit | 2517 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 66 | The patch does not generate ASF License warnings. |
   | | | 14783 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ozone.om.TestOzoneManagerHA |
   |   | hadoop.ozone.container.TestContainerReplication |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStream |
   |   | hadoop.ozone.om.snapshot.TestOzoneManagerSnapshotProvider |
   |   | hadoop.ozone.om.TestOMRatisSnapshots |
   |   | hadoop.hdds.scm.pipeline.TestRatisPipelineProvider |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1360 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle shellcheck shelldocs |
   | uname | Linux e46cf8c849e3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / dc9abd2 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/2/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/2/testReport/ |
   | Max. process+thread count | 5403 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/common hadoop-ozone/client 
hadoop-ozone/ozone-manager hadoop-ozone/dist hadoop-ozone/integration-test 
hadoop-ozone/ozone-recon hadoop-ozone/ozonefs U: hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1360/2/console |
   | versions | git=2.7.4 maven=3.3.9 shellcheck=0.4.6 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Commented] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

2019-09-10 Thread CR Hota (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926816#comment-16926816
 ] 

CR Hota commented on HDFS-14090:


[~elgoiri] Thanks for the review.

Sorry, couldn't understand the second point. The idea of the test is to make 
sure all threads finish execution, threads gracefully shutdown and then metrics 
analyzed based on if fairness is enabled/disabled.

> RBF: Improved isolation for downstream name nodes. {Static}
> ---
>
> Key: HDFS-14090
> URL: https://issues.apache.org/jira/browse/HDFS-14090
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2032) Ozone client should retry writes in case of any ratis/stateMachine exceptions

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2032?focusedWorklogId=309986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309986
 ]

ASF GitHub Bot logged work on HDDS-2032:


Author: ASF GitHub Bot
Created on: 10/Sep/19 17:35
Start Date: 10/Sep/19 17:35
Worklog Time Spent: 10m 
  Work Description: mukul1987 commented on pull request #1420: HDDS-2032. 
Ozone client should retry writes in case of any ratis/stateMachine exceptions.
URL: https://github.com/apache/hadoop/pull/1420#discussion_r322872820
 
 

 ##
 File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyOutputStream.java
 ##
 @@ -290,11 +288,12 @@ private void handleException(BlockOutputStreamEntry 
streamEntry,
 if (!failedServers.isEmpty()) {
   excludeList.addDatanodes(failedServers);
 }
-if (closedContainerException) {
+
+// if the container needs to be excluded , add the container to the
+// exclusion list , otherwise add the pipeline to the exclusion list
+if (containerExclusionException) {
   excludeList.addConatinerId(ContainerID.valueof(containerId));
-} else if (retryFailure || t instanceof TimeoutException
-|| t instanceof GroupMismatchException
-|| t instanceof NotReplicatedException) {
+} else {
 
 Review comment:
   So apart from SCE, all exceptions are expected to be related to the pipeline 
?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309986)
Time Spent: 0.5h  (was: 20m)

> Ozone client should retry writes in case of any ratis/stateMachine exceptions
> -
>
> Key: HDDS-2032
> URL: https://issues.apache.org/jira/browse/HDDS-2032
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, Ozone client retry writes on a different pipeline or container in 
> case of some specific exceptions. But in case, it sees exception such as 
> DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the write. 
> In general, the every such exception on the client should be a retriable  
> exception in ozone client and on some specific exceptions, it should take 
> some more specific exception like excluding certain containers or pipelines 
> while retrying or informing SCM of a corrupt replica etc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-2064) OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured incorrectly

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=309975=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309975
 ]

ASF GitHub Bot logged work on HDDS-2064:


Author: ASF GitHub Bot
Created on: 10/Sep/19 17:22
Start Date: 10/Sep/19 17:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1398: HDDS-2064. 
OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured 
incorrectly
URL: https://github.com/apache/hadoop/pull/1398#issuecomment-530037869
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 4861 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 33 | Maven dependency ordering for branch |
   | +1 | mvninstall | 620 | trunk passed |
   | +1 | compile | 408 | trunk passed |
   | +1 | checkstyle | 89 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 891 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 194 | trunk passed |
   | 0 | spotbugs | 515 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 751 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 31 | Maven dependency ordering for patch |
   | +1 | mvninstall | 615 | the patch passed |
   | +1 | compile | 424 | the patch passed |
   | +1 | javac | 424 | the patch passed |
   | +1 | checkstyle | 92 | the patch passed |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 681 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 189 | the patch passed |
   | +1 | findbugs | 692 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 300 | hadoop-hdds in the patch passed. |
   | -1 | unit | 199 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 48 | The patch does not generate ASF License warnings. |
   | | | 11309 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1398/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1398 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux e3eaa0d1930b 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / dc9abd2 |
   | Default Java | 1.8.0_222 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1398/1/artifact/out/patch-unit-hadoop-ozone.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1398/1/testReport/ |
   | Max. process+thread count | 1328 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/ozone-manager hadoop-ozone/integration-test U: 
hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1398/1/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.10.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309975)
Time Spent: 1h 20m  (was: 1h 10m)

> OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured 
> incorrectly
> 
>
> Key: HDDS-2064
> URL: https://issues.apache.org/jira/browse/HDDS-2064
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but 
>

[jira] [Work logged] (HDDS-1786) Datanodes takeSnapshot should delete previously created snapshots

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1786?focusedWorklogId=309973=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309973
 ]

ASF GitHub Bot logged work on HDDS-1786:


Author: ASF GitHub Bot
Created on: 10/Sep/19 17:19
Start Date: 10/Sep/19 17:19
Worklog Time Spent: 10m 
  Work Description: avijayanhwx commented on issue #1163: HDDS-1786 : 
Datanodes takeSnapshot should delete previously created s…
URL: https://github.com/apache/hadoop/pull/1163#issuecomment-530036546
 
 
   > Thanks @avijayanhwx for working on this. The changes look good.
   > I think it would be better to move all configs related to RaftServer under 
the RaftServerConfig group but that's beyond the scope of this jira.
   > 
   > I would prefer to have a test in Ozone as well to verify the snapshot 
retention behaviour of Ratis so that, in case there are changes made in Ratis 
related to this, we should be able to catch this here in ozone.
   > 
   > A simple unit test where can change snapshot threshold to 1 entry and 
verify we have n snapshot files after n transactions in the raft log directory.
   
   Added unit test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 309973)
Time Spent: 3h 20m  (was: 3h 10m)

> Datanodes takeSnapshot should delete previously created snapshots
> -
>
> Key: HDDS-1786
> URL: https://issues.apache.org/jira/browse/HDDS-1786
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Right now, after after taking a new snapshot, the previous snapshot file is 
> left in the raft log directory. When a new snapshot is taken, the previous 
> snapshots should be deleted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

2019-09-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=309954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-309954
 ]

ASF GitHub Bot logged work on HDDS-1982:


Author: ASF GitHub Bot
Created on: 10/Sep/19 16:56
Start Date: 10/Sep/19 16:56
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1344: HDDS-1982 Extend 
SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#issuecomment-530027446
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 82 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 3 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 67 | Maven dependency ordering for branch |
   | +1 | mvninstall | 628 | trunk passed |
   | +1 | compile | 390 | trunk passed |
   | +1 | checkstyle | 75 | trunk passed |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 948 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 175 | trunk passed |
   | 0 | spotbugs | 459 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | +1 | findbugs | 682 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 31 | Maven dependency ordering for patch |
   | +1 | mvninstall | 576 | the patch passed |
   | +1 | compile | 388 | the patch passed |
   | +1 | cc | 388 | the patch passed |
   | +1 | javac | 388 | the patch passed |
   | -0 | checkstyle | 37 | hadoop-hdds: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 1 | The patch has no whitespace issues. |
   | +1 | shadedclient | 736 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | javadoc | 179 | the patch passed |
   | -1 | findbugs | 215 | hadoop-hdds generated 2 new + 0 unchanged - 0 fixed 
= 2 total (was 0) |
   ||| _ Other Tests _ |
   | -1 | unit | 298 | hadoop-hdds in the patch failed. |
   | -1 | unit | 3499 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 61 | The patch does not generate ASF License warnings. |
   | | | 9701 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | FindBugs | module:hadoop-hdds |
   |  |  Dead store to nodes in 
org.apache.hadoop.hdds.scm.node.NodeStateManager.getAllNodes()  At 
NodeStateManager.java:org.apache.hadoop.hdds.scm.node.NodeStateManager.getAllNodes()
  At NodeStateManager.java:[line 396] |
   |  |  
org.apache.hadoop.hdds.scm.node.states.NodeStateMap.getNodes(NodeStatus) does 
not release lock on all exception paths  At NodeStateMap.java:on all exception 
paths  At NodeStateMap.java:[line 156] |
   | Failed junit tests | hadoop.hdds.scm.block.TestBlockManager |
   |   | hadoop.ozone.container.TestContainerReplication |
   |   | hadoop.ozone.TestSecureOzoneCluster |
   |   | hadoop.ozone.scm.node.TestQueryNode |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClient |
   |   | hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis |
   |   | hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures |
   |   | hadoop.ozone.scm.TestContainerSmallFile |
   |   | hadoop.ozone.client.rpc.Test2WayCommitInRatis |
   |   | hadoop.ozone.TestMiniChaosOzoneCluster |
   |   | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
   |   | hadoop.ozone.scm.TestGetCommittedBlockLengthAndPutKey |
   |   | hadoop.ozone.client.rpc.TestDeleteWithSlowFollower |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.1 Server=19.03.1 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1344 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit javadoc 
mvninstall shadedclient findbugs checkstyle |
   | uname | Linux 7a7a07260082 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / dc9abd2 |
   | Default Java | 1.8.0_222 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/diff-checkstyle-hadoop-hdds.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/new-findbugs-hadoop-hdds.html
 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1344/3/artifact/out/patch-unit-hadoop-hdds.txt
 |
   | unit |

[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2019-09-10 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926779#comment-16926779
 ] 

Hadoop QA commented on HDFS-14795:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 662 unchanged - 0 fixed = 664 total (was 662) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 30s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979971/HDFS-14795.008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 537abd0bc4c8 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dc9abd2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27833/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit

1 2 >

1 - 100 of 163 matches

Mail list logo