[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961700#comment-16961700
 ] 

Hadoop QA commented on HDFS-14936:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
59s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
50s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}206m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestBlockStoragePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14936 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984215/HDFS-14936.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0980bc0c176c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 30ed24a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | 

[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-28 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961692#comment-16961692
 ] 

Surendra Singh Lilhore commented on HDFS-14768:
---

Changes LGTM. 

[~ayushtkn] , [~weichiu] Can you review it once ? I have added UT , so my "+1" 
will not work here :)

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, 
> guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, 
> zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   

[jira] [Commented] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

2019-10-28 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961689#comment-16961689
 ] 

Surendra Singh Lilhore commented on HDFS-14920:
---

[~ferhui], Thanks for jira. 

Nice catch. I will look in to this after HDFS-14768, because both changing same 
part of code.

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> ---
>
> Key: HDFS-14920
> URL: https://issues.apache.org/jira/browse/HDFS-14920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch, 
> HDFS-14920.003.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>   // should reconstruct all the internal blocks before scheduling
>   // replication task for decommissioning node(s).
>   if (additionalReplRequired - numReplicas.decommissioning() -
>   numReplicas.liveEnteringMaintenanceReplicas() > 0) {
> additionalReplRequired = additionalReplRequired -
> numReplicas.decommissioning() -
> numReplicas.liveEnteringMaintenanceReplicas();
>   }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica. 
> If so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961683#comment-16961683
 ] 

Hadoop QA commented on HDFS-14927:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m  
3s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984223/HDFS-14927.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a27ec54b98dd 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 30ed24a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28198/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28198/testReport/ |
| Max. process+thread count | 2769 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28198/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-28 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961682#comment-16961682
 ] 

Surendra Singh Lilhore commented on HDFS-14768:
---

[~ferhui], This issue is different from HDFS-14920. Here the problem is, busy 
DN replica which is already exist getting reconstructed again. If the replica 
is already exist, it should not go for reconstruction.

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, 
> guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, 
> zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   

[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-10-28 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961680#comment-16961680
 ] 

hemanthboyina commented on HDFS-14922:
--

can you review the patch [~elgoiri] [~tasanuma]

> On StartUp , Snapshot modification time got changed
> ---
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-28 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2355:

Issue Type: Bug  (was: Task)

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14824) [Dynamometer] Dynamometer in org.apache.hadoop.tools does not output the benchmark results.

2019-10-28 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961676#comment-16961676
 ] 

Takanobu Asanuma commented on HDFS-14824:
-

Hi [~xkrogen], seems the feature is important. Does LinkedIn have a plan to 
merge it to hadoop, or can I work on it?

> [Dynamometer] Dynamometer in org.apache.hadoop.tools does not output the 
> benchmark results.
> ---
>
> Key: HDFS-14824
> URL: https://issues.apache.org/jira/browse/HDFS-14824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Soya Miyoshi
>Priority: Major
>
> According to the latest 
> [document|https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-dynamometer/Dynamometer.html
>  ], the benchmark results will be written in `Dauditreplay.output-path`. 
> However, current org.apache.hadooop.tools hasn't merged [this pull 
> request|https://github.com/linkedin/dynamometer/pull/76 ], so it does not 
> output the benchmark results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode

2019-10-28 Thread Xiaoyu Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-1569:
-
Fix Version/s: HDDS-1564
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~timmylicheng] for the contribution and all for the reviews. I've 
merged the PR to feature branch. 

> Add ability to SCM for creating multiple pipelines with same datanode
> -
>
> Key: HDDS-1569
> URL: https://issues.apache.org/jira/browse/HDDS-1569
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: HDDS-1564
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines 
> with datanodes that are not a part of sufficient pipelines
> - Define soft and hard upper bounds for pipeline membership
> - Create SCMAllocationManager that can be leveraged to get a candidate set of 
> datanodes based on placement policies
> - Add the datanodes to internal datastructures



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1569) Add ability to SCM for creating multiple pipelines with same datanode

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1569?focusedWorklogId=335380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335380
 ]

ASF GitHub Bot logged work on HDDS-1569:


Author: ASF GitHub Bot
Created on: 29/Oct/19 04:46
Start Date: 29/Oct/19 04:46
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #28: HDDS-1569 
Support creating multiple pipelines with same datanode
URL: https://github.com/apache/hadoop-ozone/pull/28
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335380)
Time Spent: 8h 20m  (was: 8h 10m)

> Add ability to SCM for creating multiple pipelines with same datanode
> -
>
> Key: HDDS-1569
> URL: https://issues.apache.org/jira/browse/HDDS-1569
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Siddharth Wagle
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> - Refactor _RatisPipelineProvider.create()_ to be able to create pipelines 
> with datanodes that are not a part of sufficient pipelines
> - Define soft and hard upper bounds for pipeline membership
> - Create SCMAllocationManager that can be leveraged to get a candidate set of 
> datanodes based on placement policies
> - Add the datanodes to internal datastructures



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961664#comment-16961664
 ] 

Hadoop QA commented on HDFS-13736:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 71 unchanged - 0 fixed = 72 total (was 71) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m  1s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}201m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-13736 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984213/HDFS-13736.007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 34bed49371dc 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 30ed24a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28196/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28196/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28196/testReport/ |
| Max. process+thread count | 

[jira] [Commented] (HDFS-8631) WebHDFS : Support setQuota

2019-10-28 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961660#comment-16961660
 ] 

Surendra Singh Lilhore commented on HDFS-8631:
--

Thanks [~ste...@apache.org] .

Created new jira HDFS-14939 to address your comments.

> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis container

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Summary: Fail to create Ratis container  (was: Fail to create Ratis 
pipeline )

> Fail to create Ratis container
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Critical
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14939) Update documnet and UT tests for Quota API.

2019-10-28 Thread Surendra Singh Lilhore (Jira)
Surendra Singh Lilhore created HDFS-14939:
-

 Summary: Update documnet and UT tests for Quota API.
 Key: HDFS-14939
 URL: https://issues.apache.org/jira/browse/HDFS-14939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Surendra Singh Lilhore


Refer comment in HDFS-8631.
{quote} # mention to me when you're going near this class as I can make 
suggestions in advance.
 # I now expect the documentation and the extra testing. Who is going to 
volunteer to do this?{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-28 Thread Leon Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Gao updated HDFS-14927:

Attachment: HDFS-14927.004.patch

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, 
> HDFS-14927.003.patch, HDFS-14927.004.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9913) DistCp to add -useTrash to move deleted files to Trash

2019-10-28 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961639#comment-16961639
 ] 

Jinglun commented on HDFS-9913:
---

Great work ! To be safe I don't want to delete anything directly so the 
'-useTrash' is very useful to me ! [~shenyinjie] [~ste...@apache.org] what's 
the progress of this Jira ? Are you still working on this ?

> DistCp to add -useTrash to move deleted files to Trash
> --
>
> Key: HDFS-9913
> URL: https://issues.apache.org/jira/browse/HDFS-9913
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Konstantin Shaposhnikov
>Assignee: Shen Yinjie
>Priority: Major
> Attachments: HDFS-9913_1.patch, HDFS-9913_2.patch
>
>
> Documentation for DistCp -delete option says 
> ([http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]):
> | The deletion is done by FS Shell. So the trash will be used, if it is 
> enable.
> However it seems to be no longer the case. The latest source code 
> (https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java)
>  uses `FileSystem.delete` and trash options seems to be not applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Description: 
Error logs;
2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
org.rocksdb.RocksDBException Failed init RocksDB, db path : 
/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
 exception 
:/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
 does not exist (create_if_missing is false)

Logs as following didn't reveal the true failure of write failure.  Will 
improve following logs too. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR

  was:
Logs as following didn't reveal the true failure of write failure. 

2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
CONTAINER_INTERNAL_ERROR
2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk : 
Trace ID:  : Message: ContainerID 402 creation failed : Result: 
CONTAINER_INTERNAL_ERROR


> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Priority: Critical  (was: Major)

> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Critical
>
> Error logs;
> 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR  - 
> org.rocksdb.RocksDBException Failed init RocksDB, db path : 
> /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db,
>  exception 
> :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db:
>  does not exist (create_if_missing is false)
> Logs as following didn't reveal the true failure of write failure.  Will 
> improve following logs too. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Summary: Fail to create Ratis pipeline   (was: Improve datanode write 
failure log)

> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Logs as following didn't reveal the true failure of write failure. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2363) Fail to create Ratis pipeline

2019-10-28 Thread Sammi Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-2363:
-
Issue Type: Bug  (was: Improvement)

> Fail to create Ratis pipeline 
> --
>
> Key: HDDS-2363
> URL: https://issues.apache.org/jira/browse/HDDS-2363
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>
> Logs as following didn't reveal the true failure of write failure. 
> 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO   - Operation: 
> CreateContainer : Trace ID:  : Message: Container creation failed. : Result: 
> CONTAINER_INTERNAL_ERROR
> 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO   - Operation: WriteChunk 
> : Trace ID:  : Message: ContainerID 402 creation failed : Result: 
> CONTAINER_INTERNAL_ERROR



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-28 Thread Chen Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961616#comment-16961616
 ] 

Chen Zhang commented on HDFS-14730:
---

Thanks [~eyang]

> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention

2019-10-28 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961613#comment-16961613
 ] 

Tsz-wo Sze commented on HDDS-2331:
--

Created HDDS-2375 to refactor the code.  Will implement chunk buffer using a 
list of smaller buffers, which are allocated only if needed, after that.

> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Shashikant Banerjee
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14935) Refactor DFSNetworkTopology#isNodeInScope

2019-10-28 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961610#comment-16961610
 ] 

Lisheng Sun commented on HDFS-14935:


yeah, i think it is necessary to replace "/" with the constant defined 
NodeBase.PATH_SEPARATOR_STR. Thus the code is  more concise and clear. Thank 
you [~ayushtkn].

 

> Refactor DFSNetworkTopology#isNodeInScope
> -
>
> Key: HDFS-14935
> URL: https://issues.apache.org/jira/browse/HDFS-14935
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14935.001.patch, HDFS-14935.002.patch, 
> HDFS-14935.003.patch
>
>
> {code:java}
> private boolean isNodeInScope(Node node, String scope) {
>   if (!scope.endsWith("/")) {
> scope += "/";
>   }
>   String nodeLocation = node.getNetworkLocation() + "/";
>   return nodeLocation.startsWith(scope);
> }
> {code}
> NodeBase#normalize() is used to normalize scope.
> so i refator DFSNetworkTopology#isNodeInScope.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-2356:
--

Assignee: Bharat Viswanadham

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961607#comment-16961607
 ] 

Li Cheng commented on HDDS-2356:


[~bharat] The long printing logs happen only in your branch tho. The full log 
is too huge to put here. You can think of the same logs as attached over and 
over again for multi megabyte size. 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 2:29 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
sub_files = os.listdir(dir)
num = 0 
output = "" 
for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
   with open(sub_file_path, "r") as fr:
line = fr.readline()
while line:
  num += 1
  output += line
  if (num >= 2000):
fw.write(output)
output = ""
num = 0
  line = fr.readline()
   fw.write(output)
{code}


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
 for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)
{code}

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961606#comment-16961606
 ] 

Lisheng Sun commented on HDFS-14936:


Thanks[~elgoiri] for your suggestion.

i add UT for this patch.

Could you help reivew the 002 patch? Thank you.

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2375) Refactor BlockOutputStream to allow flexible buffering

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2375?focusedWorklogId=335349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335349
 ]

ASF GitHub Bot logged work on HDDS-2375:


Author: ASF GitHub Bot
Created on: 29/Oct/19 02:28
Start Date: 29/Oct/19 02:28
Worklog Time Spent: 10m 
  Work Description: szetszwo commented on pull request #97: HDDS-2375. 
Refactor BlockOutputStream to allow flexible buffering.
URL: https://github.com/apache/hadoop-ozone/pull/97
 
 
   See https://issues.apache.org/jira/browse/HDDS-2375
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335349)
Remaining Estimate: 0h
Time Spent: 10m

> Refactor BlockOutputStream to allow flexible buffering
> --
>
> Key: HDDS-2375
> URL: https://issues.apache.org/jira/browse/HDDS-2375
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In HDDS-2331, we found that Ozone client allocates a ByteBuffer with chunk 
> size (e.g. 16MB ) to store data, unregarded the actual data size.  The 
> ByteBuffer will create a  byte[] with chunk size.  When the ByteBuffer is 
> wrapped to a ByteString the byte[] remains in the ByteString.
> As a result, when the actual data size is small (e.g. 1MB), a lot of memory 
> spaces (15MB) are wasted.
> In this JIRA, we refactor BlockOutputStream so that the buffering becomes 
> more flexible.  In a later JIRA, we could implement chunk buffer using a list 
> of smaller buffers which are allocated only if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2375) Refactor BlockOutputStream to allow flexible buffering

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2375:
-
Labels: pull-request-available  (was: )

> Refactor BlockOutputStream to allow flexible buffering
> --
>
> Key: HDDS-2375
> URL: https://issues.apache.org/jira/browse/HDDS-2375
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> In HDDS-2331, we found that Ozone client allocates a ByteBuffer with chunk 
> size (e.g. 16MB ) to store data, unregarded the actual data size.  The 
> ByteBuffer will create a  byte[] with chunk size.  When the ByteBuffer is 
> wrapped to a ByteString the byte[] remains in the ByteString.
> As a result, when the actual data size is small (e.g. 1MB), a lot of memory 
> spaces (15MB) are wasted.
> In this JIRA, we refactor BlockOutputStream so that the buffering becomes 
> more flexible.  In a later JIRA, we could implement chunk buffer using a list 
> of smaller buffers which are allocated only if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14936:
---
Attachment: HDFS-14936.002.patch

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 2:25 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
 for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)
{code}


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
{code}
for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng edited comment on HDDS-2356 at 10/29/19 2:24 AM:
--

[~bharat] I'm using python to write to a OS path. sth like:
{code:java}
// code placeholder
{code}
for sub_file in sub_files: sub_file_path = dir + sub_file with open(dest_dir + 
sub_file_path, "w") as fw: with open(sub_file_path, "r") as fr: line = 
fr.readline() while line: num += 1 output += line if (num >= 2000): 
fw.write(output) output = "" num = 0 line = fr.readline() fw.write(output)


was (Author: timmylicheng):
[~bharat] I'm using python to write to a OS path. sth like:

for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
 with open(sub_file_path, "r") as fr:
 line = fr.readline()
 while line:
 num += 1
 output += line
 if (num >= 2000):
 fw.write(output)
 output = ""
 num = 0
 line = fr.readline()
 fw.write(output)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961604#comment-16961604
 ] 

Li Cheng commented on HDDS-2356:


[~bharat] I'm using python to write to a OS path. sth like:

for sub_file in sub_files:
 sub_file_path = dir + sub_file
 with open(dest_dir + sub_file_path, "w") as fw:
 with open(sub_file_path, "r") as fr:
 line = fr.readline()
 while line:
 num += 1
 output += line
 if (num >= 2000):
 fw.write(output)
 output = ""
 num = 0
 line = fr.readline()
 fw.write(output)

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2375) Refactor BlockOutputStream to allow flexible buffering

2019-10-28 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HDDS-2375:


 Summary: Refactor BlockOutputStream to allow flexible buffering
 Key: HDDS-2375
 URL: https://issues.apache.org/jira/browse/HDDS-2375
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Tsz-wo Sze
Assignee: Tsz-wo Sze


In HDDS-2331, we found that Ozone client allocates a ByteBuffer with chunk size 
(e.g. 16MB ) to store data, unregarded the actual data size.  The ByteBuffer 
will create a  byte[] with chunk size.  When the ByteBuffer is wrapped to a 
ByteString the byte[] remains in the ByteString.

As a result, when the actual data size is small (e.g. 1MB), a lot of memory 
spaces (15MB) are wasted.

In this JIRA, we refactor BlockOutputStream so that the buffering becomes more 
flexible.  In a later JIRA, we could implement chunk buffer using a list of 
smaller buffers which are allocated only if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-10-28 Thread hu xiaodong (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961585#comment-16961585
 ] 

hu xiaodong commented on HDFS-13736:


Hello, [~hexiaoqiao], 

   I have submitted [^HDFS-13736.007.patch] as you said.

       Thanks.

> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
> --
>
> Key: HDFS-13736
> URL: https://issues.apache.org/jira/browse/HDFS-13736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Major
> Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch, 
> HDFS-13736.003.patch, HDFS-13736.004.patch, HDFS-13736.005.patch, 
> HDFS-13736.006.patch, HDFS-13736.007.patch
>
>
> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-10-28 Thread hu xiaodong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hu xiaodong updated HDFS-13736:
---
Attachment: HDFS-13736.007.patch

> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
> --
>
> Key: HDFS-13736
> URL: https://issues.apache.org/jira/browse/HDFS-13736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Major
> Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch, 
> HDFS-13736.003.patch, HDFS-13736.004.patch, HDFS-13736.005.patch, 
> HDFS-13736.006.patch, HDFS-13736.007.patch
>
>
> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961582#comment-16961582
 ] 

Íñigo Goiri commented on HDFS-14927:


A couple style comments:
* In TestRouterClientRejectOverload#373 instead of comparing the string, we 
could parse the JSON and check for the values. Actually, you do that at the end.
* We can use a lambda to define the Runnable. Actually why do we do it 
asynchronously to then wait? Could we just do the renewLease?
* Will this have timing issues if we rely on the executor to leave the threads 
active for a short period? Maybe we should make the parameter that leaves the 
thread active to be a long value.

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, 
> HDFS-14927.003.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2374) Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.

2019-10-28 Thread Anu Engineer (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-2374.

Fix Version/s: 0.5.0
   Resolution: Fixed

merged to the master

> Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.
> 
>
> Key: HDDS-2374
> URL: https://issues.apache.org/jira/browse/HDDS-2374
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See the title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2374) Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2374?focusedWorklogId=335302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335302
 ]

ASF GitHub Bot logged work on HDDS-2374:


Author: ASF GitHub Bot
Created on: 29/Oct/19 00:30
Start Date: 29/Oct/19 00:30
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #96: HDDS-2374. 
Make Ozone Readme.txt point to the Ozone websites instead …
URL: https://github.com/apache/hadoop-ozone/pull/96
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335302)
Time Spent: 20m  (was: 10m)

> Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.
> 
>
> Key: HDDS-2374
> URL: https://issues.apache.org/jira/browse/HDDS-2374
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See the title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961568#comment-16961568
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/29/19 12:10 AM:
-

{quote}[~bharat] In term of reproduction, I have a dataset which includes small 
files as well as big files and I'm using s3 gateway from ozone and mount ozone 
cluster to a local path by goofys. All the data are recursively written to the 
mount path, which essentially leads to ozone cluster. The ozone cluster is 
deployed on a 3-node VMs env and each VM has only 1 disk for ozone data 
writing. I think it's a pretty simple scenario to reproduce. The solely 
operation is writing to ozone cluster thru fuse. 
{quote}
 

I have tried with a test to run parallel MPU for a key, and it still passes. 

 
{quote}All the data are recursively written to the mount path, which 
essentially leads to ozone cluster.
{quote}
 

Mean here using cp to move the files to mount path?. 

 

If possible, could you give some steps/exact commands to repro this, which will 
help in debugging this issue? I have tried mount on docker, but after a few 
large files cp, I get OutofMemory from docker. 


was (Author: bharatviswa):
[~bharat] In term of reproduction, I have a dataset which includes small files 
as well as big files and I'm using s3 gateway from ozone and mount ozone 
cluster to a local path by goofys. All the data are recursively written to the 
mount path, which essentially leads to ozone cluster. The ozone cluster is 
deployed on a 3-node VMs env and each VM has only 1 disk for ozone data 
writing. I think it's a pretty simple scenario to reproduce. The solely 
operation is writing to ozone cluster thru fuse. 

 

I have tried with a test to run parallel MPU for a key, and it still passes. 

 

All the data are recursively written to the mount path, which essentially leads 
to ozone cluster.

 

Mean here using cp to move the files to mount path?. 

 

If possible, could you give some steps/commands to repro this, which will help 
in debug this issue. I have tried mount on docker, but after few large files 
cp, I get OutofMemory from docker. 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961568#comment-16961568
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

[~bharat] In term of reproduction, I have a dataset which includes small files 
as well as big files and I'm using s3 gateway from ozone and mount ozone 
cluster to a local path by goofys. All the data are recursively written to the 
mount path, which essentially leads to ozone cluster. The ozone cluster is 
deployed on a 3-node VMs env and each VM has only 1 disk for ozone data 
writing. I think it's a pretty simple scenario to reproduce. The solely 
operation is writing to ozone cluster thru fuse. 

 

I have tried with a test to run parallel MPU for a key, and it still passes. 

 

All the data are recursively written to the mount path, which essentially leads 
to ozone cluster.

 

Mean here using cp to move the files to mount path?. 

 

If possible, could you give some steps/commands to repro this, which will help 
in debug this issue. I have tried mount on docker, but after few large files 
cp, I get OutofMemory from docker. 

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2322) DoubleBuffer flush termination and OM shutdown's after that.

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961566#comment-16961566
 ] 

Bharat Viswanadham commented on HDDS-2322:
--

[~timmylicheng] The new issue is which is showing up now MISMATCH_PART_LIST is 
a new error, lets track it as part of HDDS-2356. This PR/fix should resolve the 
concurrentModificationException.

 

 

> DoubleBuffer flush termination and OM shutdown's after that.
> 
>
> Key: HDDS-2322
> URL: https://issues.apache.org/jira/browse/HDDS-2322
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> om1_1       | 2019-10-18 00:34:45,317 [OMDoubleBufferFlushThread] ERROR      
> - Terminating with exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> om1_1       | java.util.ConcurrentModificationException
> om1_1       | at 
> java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1660)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> om1_1       | at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> om1_1       | at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
> om1_1       | at 
> org.apache.hadoop.ozone.om.helpers.OmKeyLocationInfoGroup.getProtobuf(OmKeyLocationInfoGroup.java:65)
> om1_1       | at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
> om1_1       | at 
> java.base/java.util.Collections$2.tryAdvance(Collections.java:4745)
> om1_1       | at 
> java.base/java.util.Collections$2.forEachRemaining(Collections.java:4753)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
> om1_1       | at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
> om1_1       | at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> om1_1       | at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
> om1_1       | at 
> org.apache.hadoop.ozone.om.helpers.OmKeyInfo.getProtobuf(OmKeyInfo.java:362)
> om1_1       | at 
> org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:37)
> om1_1       | at 
> org.apache.hadoop.ozone.om.codec.OmKeyInfoCodec.toPersistedFormat(OmKeyInfoCodec.java:31)
> om1_1       | at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
> om1_1       | at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
> om1_1       | at 
> org.apache.hadoop.ozone.om.response.key.OMKeyCreateResponse.addToDBBatch(OMKeyCreateResponse.java:58)
> om1_1       | at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:139)
> om1_1       | at 
> java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> om1_1       | at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:137)
> om1_1       | at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961555#comment-16961555
 ] 

Hudson commented on HDFS-14730:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17579 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17579/])
HDFS-14730.  Removed unused configuration dfs.web.authentication.filter. 
(eyang: rev 30ed24a42112b3225ab2486ed24bd6a5011a7a7f)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* (delete) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSConfigKeys.java


> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-28 Thread Eric Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated HDFS-14730:
-
Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this to trunk.  Thank you [~zhangchen].

> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2345) Add a UT for newly added clone() in OmBucketInfo

2019-10-28 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2345.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> Add a UT for newly added clone() in OmBucketInfo
> 
>
> Key: HDDS-2345
> URL: https://issues.apache.org/jira/browse/HDDS-2345
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add a UT for newly added clone() method in OMBucketInfo as part of HDDS-2333.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2345) Add a UT for newly added clone() in OmBucketInfo

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2345?focusedWorklogId=335248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335248
 ]

ASF GitHub Bot logged work on HDDS-2345:


Author: ASF GitHub Bot
Created on: 28/Oct/19 22:40
Start Date: 28/Oct/19 22:40
Worklog Time Spent: 10m 
  Work Description: bharatviswa504 commented on pull request #92: 
HDDS-2345. Add a UT for newly added clone() in OmBucketInfo
URL: https://github.com/apache/hadoop-ozone/pull/92
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335248)
Time Spent: 20m  (was: 10m)

> Add a UT for newly added clone() in OmBucketInfo
> 
>
> Key: HDDS-2345
> URL: https://issues.apache.org/jira/browse/HDDS-2345
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add a UT for newly added clone() method in OMBucketInfo as part of HDDS-2333.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2374) Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2374:
-
Labels: pull-request-available  (was: )

> Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.
> 
>
> Key: HDDS-2374
> URL: https://issues.apache.org/jira/browse/HDDS-2374
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
>  Labels: pull-request-available
>
> See the title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2374) Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2374?focusedWorklogId=335244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335244
 ]

ASF GitHub Bot logged work on HDDS-2374:


Author: ASF GitHub Bot
Created on: 28/Oct/19 22:37
Start Date: 28/Oct/19 22:37
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #96: HDDS-2374. 
Make Ozone Readme.txt point to the Ozone websites instead …
URL: https://github.com/apache/hadoop-ozone/pull/96
 
 
   ## What changes were proposed in this pull request?
   Making the Readme.txt point to Apache Hadoop Ozone Webpage and Wiki.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2374
   
   ## How was this patch tested?
   
   Just eyeballing the change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335244)
Remaining Estimate: 0h
Time Spent: 10m

> Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.
> 
>
> Key: HDDS-2374
> URL: https://issues.apache.org/jira/browse/HDDS-2374
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See the title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2374) Make Ozone Readme.txt point to the Ozone websites instead of Hadoop.

2019-10-28 Thread Anu Engineer (Jira)
Anu Engineer created HDDS-2374:
--

 Summary: Make Ozone Readme.txt point to the Ozone websites instead 
of Hadoop.
 Key: HDDS-2374
 URL: https://issues.apache.org/jira/browse/HDDS-2374
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Anu Engineer
Assignee: Anu Engineer


See the title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2041) Don't depend on DFSUtil to check HTTP policy

2019-10-28 Thread Jitendra Nath Pandey (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDDS-2041:
---
Priority: Blocker  (was: Major)

> Don't depend on DFSUtil to check HTTP policy
> 
>
> Key: HDDS-2041
> URL: https://issues.apache.org/jira/browse/HDDS-2041
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: website
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Xiaoyu Yao
>Priority: Blocker
>
> Currently, BaseHttpServer uses DFSUtil to get Http policy. With this, when 
> http policy is set to HTTPS on hdfs-site.xml, ozone http servers try to come 
> up with HTTPS and fail if SSL certificates are not present in the required 
> location.
> Ozone web UIs should not depend on HDFS config to determine HTTP policy. 
> Instead, it should have its own config to determine the policy. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2041) Don't depend on DFSUtil to check HTTP policy

2019-10-28 Thread Jitendra Nath Pandey (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey reassigned HDDS-2041:
--

Assignee: Xiaoyu Yao  (was: Vivek Ratnavel Subramanian)

> Don't depend on DFSUtil to check HTTP policy
> 
>
> Key: HDDS-2041
> URL: https://issues.apache.org/jira/browse/HDDS-2041
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: website
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Xiaoyu Yao
>Priority: Major
>
> Currently, BaseHttpServer uses DFSUtil to get Http policy. With this, when 
> http policy is set to HTTPS on hdfs-site.xml, ozone http servers try to come 
> up with HTTPS and fail if SSL certificates are not present in the required 
> location.
> Ozone web UIs should not depend on HDFS config to determine HTTP policy. 
> Instead, it should have its own config to determine the policy. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2019-10-28 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961507#comment-16961507
 ] 

Tsz-wo Sze commented on HDFS-13671:
---

There may be some ways to improve the performance of FoldedTreeSet:
# Do not balance the tree in removeAndGet.  Balance it in the next add 
operation.
# Benchmark different values of Node.NODE_SIZE (hard coded to 64) to find the 
optimal value.
# Avoid using System.arraycopy.   I guess it is where the slowness from.

Are there any existing benchmarks?  I could test if these ideas work.

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-28 Thread Leon Gao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961476#comment-16961476
 ] 

Leon Gao commented on HDFS-14927:
-

Fix checkstyle ^

[~elgoiri] I think the failure is unrelated (the previous one passed). Could 
you take a look?

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, 
> HDFS-14927.003.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961459#comment-16961459
 ] 

Hadoop QA commented on HDFS-14927:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m  6s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 61m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.federation.metrics.TestMetricsBase |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984202/HDFS-14927.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a8f271a156b4 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9ef6ed9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28195/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28195/testReport/ |
| Max. process+thread count | 2778 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28195/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-28 Thread Leon Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leon Gao updated HDFS-14927:

Attachment: HDFS-14927.003.patch

> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, 
> HDFS-14927.003.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13541) NameNode Port based selective encryption

2019-10-28 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13541:
--
Release Note: 
This feature allows HDFS to selectively enforce encryption for both RPC 
(NameNode) and data transfer (DataNode). With this feature enabled, NameNode 
can listen on multiple ports, and different ports can have different security 
configurations. Depending on which NameNode port clients connect to, the RPC 
calls and the following data transfer will enforce security configuration 
corresponding to this NameNode port. This can help when there is requirement to 
enforce different security policies depending on the location where the clients 
are connecting from.

This can be enabled by setting `hadoop.security.saslproperties.resolver.class` 
configuration to `org.apache.hadoop.security.IngressPortBasedResolver`, and add 
the additional NameNode auxiliary ports by setting 
`dfs.namenode.rpc-address.auxiliary-ports`, and set the security individual 
ports by configuring `ingress.port.sasl.configured.ports`.

> NameNode Port based selective encryption
> 
>
> Key: HDFS-13541
> URL: https://issues.apache.org/jira/browse/HDFS-13541
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-13541-branch-2.001.patch, 
> HDFS-13541-branch-2.002.patch, HDFS-13541-branch-2.003.patch, 
> HDFS-13541-branch-3.1.001.patch, HDFS-13541-branch-3.1.002.patch, 
> HDFS-13541-branch-3.2.001.patch, HDFS-13541-branch-3.2.002.patch, NameNode 
> Port based selective encryption-v1.pdf
>
>
> Here at LinkedIn, one issue we face is that we need to enforce different 
> security requirement based on the location of client and the cluster. 
> Specifically, for clients from outside of the data center, it is required by 
> regulation that all traffic must be encrypted. But for clients within the 
> same data center, unencrypted connections are more desired to avoid the high 
> encryption overhead. 
> HADOOP-10221 introduced pluggable SASL resolver, based on which HADOOP-10335 
> introduced WhitelistBasedResolver which solves the same problem. However we 
> found it difficult to fit into our environment for several reasons. In this 
> JIRA, on top of pluggable SASL resolver, *we propose a different approach of 
> running RPC two ports on NameNode, and the two ports will be enforcing 
> encrypted and unencrypted connections respectively, and the following 
> DataNode access will simply follow the same behaviour of 
> encryption/unencryption*. Then by blocking unencrypted port on datacenter 
> firewall, we can completely block unencrypted external access.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-10-28 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14403:
---
Release Note: 
This adds an extension to the IPC FairCallQueue which allows for the 
consideration of the *cost* of a user's operations when deciding how they 
should be prioritized, as opposed to the number of operations. This can be 
helpful for protecting the NameNode from clients which submit very expensive 
operations (e.g. large listStatus operations or recursive getContentSummary 
operations).

This can be enabled by setting the `ipc..costprovder.impl` configuration 
to `org.apache.hadoop.ipc.WeightedTimeCostProvider`.

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, 
> HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, 
> HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, 
> HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, 
> HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, 
> HDFS-14403.012.patch, HDFS-14403.013.patch, HDFS-14403.branch-2.8.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-28 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14931) hdfs crypto commands limit column width

2019-10-28 Thread Eric Badger (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-14931:
---
Fix Version/s: 3.2.2
   3.1.4
   3.3.0
   3.0.4

Thanks for the review, [~weichiu]! I committed this to trunk, branch-3.2, 
branch-3.1, and branch-3.0

> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14912) Set dfs.image.string-tables.expanded default to false in branch-2.7

2019-10-28 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14912:
---
Fix Version/s: 2.7.8
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

We'll likely never going to release 2.7.8 but for any one who's still on 2.7.x 
and want to patch manually, this is good to have.

> Set dfs.image.string-tables.expanded default to false in branch-2.7
> ---
>
> Key: HDFS-14912
> URL: https://issues.apache.org/jira/browse/HDFS-14912
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.7.8
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 2.7.8
>
> Attachments: HDFS-14912.001.branch-2.7.patch
>
>
> In the branch-2.7 patch for CVE-2018-11768 HDFS FSImage Corruption, 
> dfs.image.string-tables.expanded is set to true by default: 
> https://github.com/apache/hadoop/commit/109d44604ca843212bdf22b50e86a5a41e1d21da#diff-36b19e9d8816002ed9dff8580055d3fbR627
> This is different from all other branches, which set it to false by default.
> For instance, branch-2.8: 
> https://github.com/apache/hadoop/commit/f697f3c4fc0067bb82494e445900d86942685b09#diff-36b19e9d8816002ed9dff8580055d3fbR629
> Goal: Flip the dfs.image.string-tables.expanded default in branch-2.7 to 
> false to make it consistent with other branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14730) Remove unused configuration dfs.web.authentication.filter

2019-10-28 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961343#comment-16961343
 ] 

Eric Yang commented on HDFS-14730:
--

+1 for patch 002.  Will commit to trunk if no objections.

[~zhangchen] Thank you for the patch.


> Remove unused configuration dfs.web.authentication.filter 
> --
>
> Key: HDFS-14730
> URL: https://issues.apache.org/jira/browse/HDFS-14730
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14730.001.patch, HDFS-14730.002.patch
>
>
> After HADOOP-16314, this configuration is not used any where, so I propose to 
> deprecate it to avoid misuse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14403) Cost-Based RPC FairCallQueue

2019-10-28 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14403:
---
Fix Version/s: 2.10.0

> Cost-Based RPC FairCallQueue
> 
>
> Key: HDFS-14403
> URL: https://issues.apache.org/jira/browse/HDFS-14403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc, namenode
>Reporter: Erik Krogen
>Assignee: Christopher Gregorian
>Priority: Major
>  Labels: qos, rpc
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: CostBasedFairCallQueueDesign_v0.pdf, 
> HDFS-14403.001.patch, HDFS-14403.002.patch, HDFS-14403.003.patch, 
> HDFS-14403.004.patch, HDFS-14403.005.patch, HDFS-14403.006.combined.patch, 
> HDFS-14403.006.patch, HDFS-14403.007.patch, HDFS-14403.008.patch, 
> HDFS-14403.009.patch, HDFS-14403.010.patch, HDFS-14403.011.patch, 
> HDFS-14403.012.patch, HDFS-14403.013.patch, HDFS-14403.branch-2.8.patch
>
>
> HADOOP-15016 initially described extensions to the Hadoop FairCallQueue 
> encompassing both cost-based analysis of incoming RPCs, as well as support 
> for reservations of RPC capacity for system/platform users. This JIRA intends 
> to track the former, as HADOOP-15016 was repurposed to more specifically 
> focus on the reservation portion of the work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2373) Move isUseRatis getFactor and getType from XCeiverClientManager

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2373?focusedWorklogId=335081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335081
 ]

ASF GitHub Bot logged work on HDDS-2373:


Author: ASF GitHub Bot
Created on: 28/Oct/19 18:11
Start Date: 28/Oct/19 18:11
Worklog Time Spent: 10m 
  Work Description: fapifta commented on pull request #95: HDDS-2373 Move 
isUseRatis getFactor and getType from XCeiverClientManager
URL: https://github.com/apache/hadoop-ozone/pull/95
 
 
   ## What changes were proposed in this pull request?
   
   The PR aims to remove the isUseRatis(), getType(), and getFactor methods 
from the XCeiverClientManager class, as the return values of these methods are 
dependent on a single configuration value 
(ScmConfigKeys.DFS_CONTAINER_RATIS_ENABLED_KEY).
   
   The proposed solution moves the setup of the ContainerOperationClient setup 
to its constructor, with that the ContainerOperationClient class will be 
responsible to set up its internally used SCMClient, and XCeiverClientManager 
based on the configuration.
   It also gets responsible to set up the container size limit based on the 
configuration, and it is not being set anymore via a static method.
   
   To deal with the change, and to provide an easy way to get the values in 
JUnit tests, SCMTestUtil gets two new pubic static utility methods to get the 
ReplicationType, and ReplicationFactor based on the configuration. All accesses 
of the old methods on XCeiverClientManager are mapped to the new static utility 
methods.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2373 - Move isUseRatis getFactor 
and getType from XCeiverClientManager
   
   ## How was this patch tested?
   
   As this is a refactoring without changing any outer logic, no new JUnit 
tests are needed, but all existent tests has to pass as before.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335081)
Remaining Estimate: 0h
Time Spent: 10m

> Move isUseRatis getFactor and getType from XCeiverClientManager
> ---
>
> Key: HDDS-2373
> URL: https://issues.apache.org/jira/browse/HDDS-2373
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The given methods in XCeiverClientManager are working based on the 
> configuration supplied in the constructor of XCeiverClientManager class.
> The only real code usage of this is in ContainerOperationsClient.
> Refactor the ContainerOperationsClient constructor to work based on the 
> configuration, and then move these values there directly and set the values 
> of them at the constructor. Clean up all test references to the methods, and 
> remove the methods from the XCeiverClientManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2373) Move isUseRatis getFactor and getType from XCeiverClientManager

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2373:
-
Labels: pull-request-available  (was: )

> Move isUseRatis getFactor and getType from XCeiverClientManager
> ---
>
> Key: HDDS-2373
> URL: https://issues.apache.org/jira/browse/HDDS-2373
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>
> The given methods in XCeiverClientManager are working based on the 
> configuration supplied in the constructor of XCeiverClientManager class.
> The only real code usage of this is in ContainerOperationsClient.
> Refactor the ContainerOperationsClient constructor to work based on the 
> configuration, and then move these values there directly and set the values 
> of them at the constructor. Clean up all test references to the methods, and 
> remove the methods from the XCeiverClientManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2285) GetBlock and ReadChunk commands should be sent to the same datanode

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2285?focusedWorklogId=335080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335080
 ]

ASF GitHub Bot logged work on HDDS-2285:


Author: ASF GitHub Bot
Created on: 28/Oct/19 18:10
Start Date: 28/Oct/19 18:10
Worklog Time Spent: 10m 
  Work Description: hanishakoneru commented on pull request #40: HDDS-2285. 
GetBlock and ReadChunk command from the client should be s…
URL: https://github.com/apache/hadoop-ozone/pull/40
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335080)
Time Spent: 20m  (was: 10m)

> GetBlock and ReadChunk commands should be sent to the same datanode
> ---
>
> Key: HDDS-2285
> URL: https://issues.apache.org/jira/browse/HDDS-2285
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I can be observed that the GetBlock and ReadChunk command is sent to 2 
> different datanodes. It should be sent to the same datanode to re-use the 
> connection.
> {code}
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command GetBlock to 
> datanode 172.26.32.224
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command ReadChunk to 
> datanode 172.26.32.231
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2285) GetBlock and ReadChunk commands should be sent to the same datanode

2019-10-28 Thread Hanisha Koneru (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru resolved HDDS-2285.
--
Fix Version/s: 0.5.0
   Resolution: Fixed

> GetBlock and ReadChunk commands should be sent to the same datanode
> ---
>
> Key: HDDS-2285
> URL: https://issues.apache.org/jira/browse/HDDS-2285
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I can be observed that the GetBlock and ReadChunk command is sent to 2 
> different datanodes. It should be sent to the same datanode to re-use the 
> connection.
> {code}
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command GetBlock to 
> datanode 172.26.32.224
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command ReadChunk to 
> datanode 172.26.32.231
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2285) GetBlock and ReadChunk commands should be sent to the same datanode

2019-10-28 Thread Hanisha Koneru (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDDS-2285:
-
Summary: GetBlock and ReadChunk commands should be sent to the same 
datanode  (was: GetBlock and ReadChunk command from the client should be sent 
to the same datanode to re-use the same connection)

> GetBlock and ReadChunk commands should be sent to the same datanode
> ---
>
> Key: HDDS-2285
> URL: https://issues.apache.org/jira/browse/HDDS-2285
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I can be observed that the GetBlock and ReadChunk command is sent to 2 
> different datanodes. It should be sent to the same datanode to re-use the 
> connection.
> {code}
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command GetBlock to 
> datanode 172.26.32.224
> 19/10/10 00:43:42 INFO scm.XceiverClientGrpc: Send command ReadChunk to 
> datanode 172.26.32.231
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2373) Move isUseRatis getFactor and getType from XCeiverClientManager

2019-10-28 Thread Istvan Fajth (Jira)
Istvan Fajth created HDDS-2373:
--

 Summary: Move isUseRatis getFactor and getType from 
XCeiverClientManager
 Key: HDDS-2373
 URL: https://issues.apache.org/jira/browse/HDDS-2373
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Istvan Fajth


The given methods in XCeiverClientManager are working based on the 
configuration supplied in the constructor of XCeiverClientManager class.

The only real code usage of this is in ContainerOperationsClient.

Refactor the ContainerOperationsClient constructor to work based on the 
configuration, and then move these values there directly and set the values of 
them at the constructor. Clean up all test references to the methods, and 
remove the methods from the XCeiverClientManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2373) Move isUseRatis getFactor and getType from XCeiverClientManager

2019-10-28 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDDS-2373:
--

Assignee: Istvan Fajth

> Move isUseRatis getFactor and getType from XCeiverClientManager
> ---
>
> Key: HDDS-2373
> URL: https://issues.apache.org/jira/browse/HDDS-2373
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> The given methods in XCeiverClientManager are working based on the 
> configuration supplied in the constructor of XCeiverClientManager class.
> The only real code usage of this is in ContainerOperationsClient.
> Refactor the ContainerOperationsClient constructor to work based on the 
> configuration, and then move these values there directly and set the values 
> of them at the constructor. Clean up all test references to the methods, and 
> remove the methods from the XCeiverClientManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14935) Refactor DFSNetworkTopology#isNodeInScope

2019-10-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961293#comment-16961293
 ] 

Ayush Saxena commented on HDFS-14935:
-

Is this just replacing {{"/"}} with {{"NodeBase.PATH_SEPARATOR_STR"}}?

> Refactor DFSNetworkTopology#isNodeInScope
> -
>
> Key: HDFS-14935
> URL: https://issues.apache.org/jira/browse/HDFS-14935
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14935.001.patch, HDFS-14935.002.patch, 
> HDFS-14935.003.patch
>
>
> {code:java}
> private boolean isNodeInScope(Node node, String scope) {
>   if (!scope.endsWith("/")) {
> scope += "/";
>   }
>   String nodeLocation = node.getNetworkLocation() + "/";
>   return nodeLocation.startsWith(scope);
> }
> {code}
> NodeBase#normalize() is used to normalize scope.
> so i refator DFSNetworkTopology#isNodeInScope.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961280#comment-16961280
 ] 

Íñigo Goiri commented on HDFS-14882:


In the past, we have had many issues with the sorting for blocks with 20+ 
replicas.
Avoiding additional sorting would be ideal.
Shuffle should be faster but if we can minimize its use, it wouldn't hurt.

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch, 
> HDFS-14882.suggestion
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14938:
---
Description: Add check if excludedNodes contain scope in 
DFSNetworkTopology#chooseRandomWithStorageType().

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned HDFS-14938:
--

Assignee: Lisheng Sun

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961275#comment-16961275
 ] 

Íñigo Goiri commented on HDFS-14936:


How is the test coverage for this?
This is indirectly tested by other but it may make sense to make it more 
explicit given now is public.

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961271#comment-16961271
 ] 

Hadoop QA commented on HDFS-14882:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
6s{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
22s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  1m 
40s{color} | {color:red} client in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
51s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 32s{color} | {color:orange} root: The patch generated 2 new + 561 unchanged 
- 1 fixed = 563 total (was 562) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
30s{color} | {color:red} client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
53s{color} | {color:red} hadoop-ozone/client generated 1 new + 0 unchanged - 0 
fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 31s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}246m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-ozone/client |
|  |  Format string should use %n rather than n in 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(byte[], int, int)  At 
KeyInputStream.java:rather than n in 

[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException

2019-10-28 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961248#comment-16961248
 ] 

Chen Liang commented on HDFS-14937:
---

Thanks for reporting this [~xuzq_zander]. v001 patch makes sense to me. Can we 
add a log message though, to explicitly indicate it is interrupted? Also, 
checking {{Thread.currentThread().isInterrupted()}} seems unusual, is it 
possible to check, say, something like {{InterruptedException}}?

> [SBN read] ObserverReadProxyProvider should throw InterruptException
> 
>
> Key: HDFS-14937
> URL: https://issues.apache.org/jira/browse/HDFS-14937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14937-trunk-001.patch
>
>
> ObserverReadProxyProvider should throw InterruptException immediately if one 
> Observer catch InterruptException in invoking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14931) hdfs crypto commands limit column width

2019-10-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961244#comment-16961244
 ] 

Hudson commented on HDFS-14931:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17578 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17578/])
HDFS-14931. hdfs crypto commands limit column width. Contributed by Eric 
(ebadger: rev 9ef6ed9c1c83b9752e772ece7a716a33045752bf)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CryptoAdmin.java


> hdfs crypto commands limit column width
> ---
>
> Key: HDFS-14931
> URL: https://issues.apache.org/jira/browse/HDFS-14931
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: HDFS-14931.001.patch
>
>
> {noformat}
> foo@bar$ hdfs crypto -listZones
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool1  encr
>   
> yptio
>   nzon
>   e1
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool2  encr
>   
> yptio
>   nzon
>   e2
> /projects/foo/bar/fizzgig/myprojectdirectorynameorsomethingcool3  encr
>   
> yptio
>   nzon
>   e3
> {noformat}
> The command ends up looking something really ugly like this when the path is 
> long. This also makes it very difficult to pipe the output into other 
> utilities, such as awk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961223#comment-16961223
 ] 

Bharat Viswanadham edited comment on HDDS-2356 at 10/28/19 4:26 PM:


{quote}Also it's print the same pipeline id in s3g logs like crazy. Wonder if 
that's expected. [~bharat]

2019-10-28 11:43:08,912 [qtp1383524016-24] INFO - Allocating block with 
ExcludeList \{datanodes = [], containerIds = [], pipelineIds = []}
 ...skipping...
 eID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, 
PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a,
{quote}
This is a log in allocateNewBlock() in BlockOutputStreamEntryPool.java. 
HDDS-2286 changed this behavior to print this info to the logs. As in the 
excludeList it is list of pipelineId's, same entry if added again it is adding 
again to the list, I think we should have a check to add if it does not exist 
in list. Hey, one thing this is getting printed as pipelineIds=[], but after 
that it is printing same pipelineID multiple times. can you paste the complete 
log?


was (Author: bharatviswa):
{quote}Also it's print the same pipeline id in s3g logs like crazy. Wonder if 
that's expected. [~bharat]

2019-10-28 11:43:08,912 [qtp1383524016-24] INFO - Allocating block with 
ExcludeList \{datanodes = [], containerIds = [], pipelineIds = []}
...skipping...
eID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, 
PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a,
{quote}
This is a log in allocateNewBlock() in BlockOutputStreamEntryPool.java. 
HDDS-2286 changed this behavior to print this info to the logs. As in the 
excludeList it is list of pipelineId's, same entry if added again it is adding 
again to the list, I think we should have a check to add if it does not exist 
in list.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-28 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961223#comment-16961223
 ] 

Bharat Viswanadham commented on HDDS-2356:
--

{quote}Also it's print the same pipeline id in s3g logs like crazy. Wonder if 
that's expected. [~bharat]

2019-10-28 11:43:08,912 [qtp1383524016-24] INFO - Allocating block with 
ExcludeList \{datanodes = [], containerIds = [], pipelineIds = []}
...skipping...
eID=3c94d3f5-3c0e-4994-9c63-dc487071be1a, 
PipelineID=3c94d3f5-3c0e-4994-9c63-dc487071be1a,
{quote}
This is a log in allocateNewBlock() in BlockOutputStreamEntryPool.java. 
HDDS-2286 changed this behavior to print this info to the logs. As in the 
excludeList it is list of pipelineId's, same entry if added again it is adding 
again to the list, I think we should have a check to add if it does not exist 
in list.

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961199#comment-16961199
 ] 

Hadoop QA commented on HDFS-14920:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
38s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 147 unchanged - 0 fixed = 150 total (was 147) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 
10s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14920 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984168/HDFS-14920.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dc4722a6e7c1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d5e9971 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28193/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28193/testReport/ |
| Max. process+thread count | 4183 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28193/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (HDFS-14935) Refactor DFSNetworkTopology#isNodeInScope

2019-10-28 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961178#comment-16961178
 ] 

Lisheng Sun commented on HDFS-14935:


[~ayushtkn] Could you have time to take a reivew for the 003 patch? Thank you.

> Refactor DFSNetworkTopology#isNodeInScope
> -
>
> Key: HDFS-14935
> URL: https://issues.apache.org/jira/browse/HDFS-14935
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14935.001.patch, HDFS-14935.002.patch, 
> HDFS-14935.003.patch
>
>
> {code:java}
> private boolean isNodeInScope(Node node, String scope) {
>   if (!scope.endsWith("/")) {
> scope += "/";
>   }
>   String nodeLocation = node.getNetworkLocation() + "/";
>   return nodeLocation.startsWith(scope);
> }
> {code}
> NodeBase#normalize() is used to normalize scope.
> so i refator DFSNetworkTopology#isNodeInScope.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-14936:
--

Assignee: Lisheng Sun

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-28 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14938:
---
Attachment: HDFS-14938.002.patch

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException

2019-10-28 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961151#comment-16961151
 ] 

Erik Krogen commented on HDFS-14937:


FYI [~shv] [~vagarychen] [~csun]

> [SBN read] ObserverReadProxyProvider should throw InterruptException
> 
>
> Key: HDFS-14937
> URL: https://issues.apache.org/jira/browse/HDFS-14937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
> Attachments: HDFS-14937-trunk-001.patch
>
>
> ObserverReadProxyProvider should throw InterruptException immediately if one 
> Observer catch InterruptException in invoking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961146#comment-16961146
 ] 

Hadoop QA commented on HDFS-14938:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14938 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984164/HDFS-14938.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0c66d618535d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d5e9971 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28191/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28191/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28191/testReport/ |
| Max. process+thread count | 2709 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-28 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961123#comment-16961123
 ] 

Xiaoqiao He commented on HDFS-14882:


Thanks [~pifta] for your suggestions, [^HDFS-14882.suggestion] is more graceful 
changes in my opinion and we could reduce the double sort overhead.  Just one 
concerned, I am not sure if we have another overhead about 
{{Collections.shuffle(list)}}. BTW, {{KeyInputStream.java}} seems a class of 
ozone project, may be an unexpected change.

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch, 
> HDFS-14882.suggestion
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13736) BlockPlacementPolicyDefault can not choose favored nodes when 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false

2019-10-28 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961096#comment-16961096
 ] 

Xiaoqiao He commented on HDFS-13736:


[^HDFS-13736.006.patch] LGTM, check the failed unit tests, it is unrelated with 
this changes I think. 
It will be better if we keep the following code format same with community 
style.
1. keep 4 spaces when start with newlines.
{code:java}
+  targets = chooseTarget(2, dataNodes[3], null,
+  favouredNodes);
+  assertEquals(targets.length, 2);
+  for (int i = 0; i < targets.length; i++) {
+assertTrue("Target should be a part of Expected Targets",
+expectedTargets.contains(targets[i].getDatanodeDescriptor()));
{code}
2. annotation start with upper case letter at most times.
{code:java}
   * choose storage of local or favored node.
   * @param localOrFavoredNode local or favored node
   * @param isFavoredNode if target node is favored node
{code}
non-essentials but nice to have. Thanks [~xiaodong.hu].

> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false
> --
>
> Key: HDFS-13736
> URL: https://issues.apache.org/jira/browse/HDFS-13736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: hu xiaodong
>Assignee: hu xiaodong
>Priority: Major
> Attachments: HDFS-13736.001.patch, HDFS-13736.002.patch, 
> HDFS-13736.003.patch, HDFS-13736.004.patch, HDFS-13736.005.patch, 
> HDFS-13736.006.patch
>
>
> BlockPlacementPolicyDefault can not choose favored nodes when 
> 'dfs.namenode.block-placement-policy.default.prefer-local-node' set to false. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

2019-10-28 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961042#comment-16961042
 ] 

Fei Hui commented on HDFS-14920:


[~ayushtkn] Upload v003 patch.
Added a new UT testCountNodes. Please review

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> ---
>
> Key: HDFS-14920
> URL: https://issues.apache.org/jira/browse/HDFS-14920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch, 
> HDFS-14920.003.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>   // should reconstruct all the internal blocks before scheduling
>   // replication task for decommissioning node(s).
>   if (additionalReplRequired - numReplicas.decommissioning() -
>   numReplicas.liveEnteringMaintenanceReplicas() > 0) {
> additionalReplRequired = additionalReplRequired -
> numReplicas.decommissioning() -
> numReplicas.liveEnteringMaintenanceReplicas();
>   }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica. 
> If so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

2019-10-28 Thread Fei Hui (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HDFS-14920:
---
Attachment: HDFS-14920.003.patch

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> ---
>
> Key: HDFS-14920
> URL: https://issues.apache.org/jira/browse/HDFS-14920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch, 
> HDFS-14920.003.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>   // should reconstruct all the internal blocks before scheduling
>   // replication task for decommissioning node(s).
>   if (additionalReplRequired - numReplicas.decommissioning() -
>   numReplicas.liveEnteringMaintenanceReplicas() > 0) {
> additionalReplRequired = additionalReplRequired -
> numReplicas.decommissioning() -
> numReplicas.liveEnteringMaintenanceReplicas();
>   }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica. 
> If so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-28 Thread Istvan Fajth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth updated HDFS-14882:

Attachment: HDFS-14882.suggestion

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch, 
> HDFS-14882.suggestion
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-28 Thread Istvan Fajth (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961023#comment-16961023
 ] 

Istvan Fajth commented on HDFS-14882:
-

Hello [~hexiaoqiao],

I was checking into the patch, and into the proposal, and I think even though 
the changes looks cool, and does what it promises as I see, I would have one 
question/suggestion to consider instead of doing this when the 
dfs.namenode.read.considerLoad is set to true:
In NetworkTopology#sortByDistance, we already sort the nodes by network 
distance, and there is a shuffle for the nodes on the same level that thrives 
to ensure some distribution of load. That shuffle can be considered as well as 
a secondary sorting strategy, which we can inject into that point from outside. 
If we inject the secondary sorting from the DataNodeManager, then if the 
read.considerLoad is turned on, we can inject a sorting by transceiver count 
instead of the shuffle.

With this, we can avoid calculating the network distance twice, also we can 
avoid shuffling then sorting by transceiver count. I am posting a proposal, 
just to demonstrate what exactly I am thinking about, the JUnit test in 
patch-008 is passing with it, I haven't tried other tests locally.

Please share what do you think about this approach. Also I am happy to have 
some feedback from you [~ayushtkn] and [~elgoiri] too.

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8631) WebHDFS : Support setQuota

2019-10-28 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961004#comment-16961004
 ] 

Steve Loughran commented on HDFS-8631:
--

Hi, just noticed this. 

Can can I remind people that I that I generally expect any changes made to file 
system APIs to be accompanied by changes to filesystem.md so as to define, 
strictly, what it's meant to do. Pulling up what HDFS does and saying "that" 
doesn't count, because it doesn't always cover the corner cases, or clearly 
define what happens.

In particular, I don't see any tests in this patch which explore what happens 
if I set negative quotas, invoke the operation on paths which do not exist, 
etc. etc. These are critical to verify that new implementations of any FS API 
do actually behave the way HDFS does. People who provide their own 
implementations of the APIs depend on this -and people who use the APIs deserve 
the actual details of what happens, because "trace through what HDFS does" 
doesn't count as documentation.


# mention to me when you're going near this class as I can make suggestions in 
advance.
# I now expect the documentation and the extra testing. Who is going to 
volunteer to do this?



> WebHDFS : Support setQuota
> --
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.2
>Reporter: nijel
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch, 
> HDFS-8631-003.patch, HDFS-8631-004.patch, HDFS-8631-005.patch, 
> HDFS-8631-006.patch, HDFS-8631-007.patch, HDFS-8631-008.patch, 
> HDFS-8631-009.patch, HDFS-8631-010.patch, HDFS-8631-011.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960994#comment-16960994
 ] 

Ayush Saxena commented on HDFS-14936:
-

Seems OK.
+1
Will commit by tomorrow, if no comments.

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-28 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14938:
---
Attachment: HDFS-14938.001.patch
Status: Patch Available  (was: Open)

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-28 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-14938:
--

 Summary: Add check if excludedNodes contain scope in 
DFSNetworkTopology#chooseRandomWithStorageType() 
 Key: HDFS-14938
 URL: https://issues.apache.org/jira/browse/HDFS-14938
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lisheng Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960979#comment-16960979
 ] 

Lisheng Sun commented on HDFS-14936:


[~ayushtkn]

all of subclasses of InnerNode as follow:

InnerNodeImpl extends InnerNode

DFSTopologyNodeImpl extends InnerNodeImpl

InnerNodeWithNodeGroup extends InnerNodeImpl

all these class have getNumOfChildren().

According to the definition of InnerNode, it must have children. And current 
code InnerNode has getChildren().
{code:java}
/** @return its children */
List getChildren();
{code}
so i think it should add getNumOfChildren().

Please correct me if am wrong. Thank you [~ayushtkn].

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960969#comment-16960969
 ] 

Hadoop QA commented on HDFS-14936:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
17s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 47s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 1s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}210m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14936 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984147/HDFS-14936.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6778d1273d8f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7be5508 |
| maven | version: Apache Maven 3.3.9 |
| 

[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException

2019-10-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960967#comment-16960967
 ] 

Hadoop QA commented on HDFS-14937:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
49s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14937 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984156/HDFS-14937-trunk-001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9387d959294d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d5e9971 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28190/testReport/ |
| Max. process+thread count | 309 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28190/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [SBN read] 

[jira] [Commented] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

2019-10-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960962#comment-16960962
 ] 

Ayush Saxena commented on HDFS-14920:
-

{code:java}
 if (!liveBitSet.get(blockIndex)) {
liveBitSet.set(blockIndex); -  (1)
// Sub decommissioning because the index replica is live.
if (decommissioningBitSet.get(blockIndex)) {
  counters.subtract(StoredReplicaState.DECOMMISSIONING, 1);
} else {
  decommissioningBitSet.set(blockIndex); 
---   (2)
}
{code}
bq.  ether in liveReplicas or in decommissioning replicas, but not both.

Here at (1) you added it in liveBitset and at (2) you added this in 
decomissioningBitSet too?

Anyway, I tried your UT, it passed without the else part too. If adding in both 
is required, Please can you extend a UT for that case.

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> ---
>
> Key: HDFS-14920
> URL: https://issues.apache.org/jira/browse/HDFS-14920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>   // should reconstruct all the internal blocks before scheduling
>   // replication task for decommissioning node(s).
>   if (additionalReplRequired - numReplicas.decommissioning() -
>   numReplicas.liveEnteringMaintenanceReplicas() > 0) {
> additionalReplRequired = additionalReplRequired -
> numReplicas.decommissioning() -
> numReplicas.liveEnteringMaintenanceReplicas();
>   }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica. 
> If so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960940#comment-16960940
 ] 

Ayush Saxena commented on HDFS-14936:
-

But InnerNode is extended not only by {{InnerNodeImpl}} and 
{{DFSTopologyNodeImpl}}

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >