[jira] [Commented] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2018-08-14 Thread Jianfei Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580777#comment-16580777
 ] 

Jianfei Jiang commented on HDFS-8277:
-

Hi

[~brahmareddy]

As the commands in DFSAdmin except safemode have been fixed in HDFS-12935 
before some months. Safemode command has not been fixed for a long time, could 
we fix it now. I have uploaded HDFS-8277_5.patch like what I did in 12935. Hope 
to solve this issue asap as it is actually a bug in some cases.

> Safemode enter fails when Standby NameNode is down
> --
>
> Key: HDFS-8277
> URL: https://issues.apache.org/jira/browse/HDFS-8277
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.6.0
> Environment: HDP 2.2.0
>Reporter: Hari Sekhon
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch, 
> HDFS-8277_5.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8277) Safemode enter fails when Standby NameNode is down

2018-08-14 Thread Jianfei Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianfei Jiang updated HDFS-8277:

Attachment: HDFS-8277_5.patch

> Safemode enter fails when Standby NameNode is down
> --
>
> Key: HDFS-8277
> URL: https://issues.apache.org/jira/browse/HDFS-8277
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.6.0
> Environment: HDP 2.2.0
>Reporter: Hari Sekhon
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch, 
> HDFS-8277_5.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13217) Log audit event only used last EC policy name when add multiple policies from file

2018-08-14 Thread liaoyuxiangqin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580766#comment-16580766
 ] 

liaoyuxiangqin commented on HDFS-13217:
---

[~xiaochen] [~eddyxu] [~knanasi] Thanks for your review on this, i'm sorry so 
later to push this through the finish line by fix checkstyle warning.

> Log audit event only used last EC policy name when add multiple policies from 
> file 
> ---
>
> Key: HDFS-13217
> URL: https://issues.apache.org/jira/browse/HDFS-13217
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Major
> Attachments: HDFS-13217.001.patch, HDFS-13217.002.patch, 
> HDFS-13217.003.patch, HDFS-13217.004.patch, HDFS-13217.005.patch
>
>
> When i read the addErasureCodingPolicies() of FSNamesystem class in namenode, 
> i found the following code only used last ec policy name for  logAuditEvent, 
> i think this audit log can't track whole policies for the add multiple 
> erasure coding policies to the ErasureCodingPolicyManager. Thanks.
> {code:java|title=FSNamesystem.java|borderStyle=solid}
> try {
>   checkOperation(OperationCategory.WRITE);
>   checkNameNodeSafeMode("Cannot add erasure coding policy");
>   for (ErasureCodingPolicy policy : policies) {
> try {
>   ErasureCodingPolicy newPolicy =
>   FSDirErasureCodingOp.addErasureCodingPolicy(this, policy,
>   logRetryCache);
>   addECPolicyName = newPolicy.getName();
>   responses.add(new AddErasureCodingPolicyResponse(newPolicy));
> } catch (HadoopIllegalArgumentException e) {
>   responses.add(new AddErasureCodingPolicyResponse(policy, e));
> }
>   }
>   success = true;
>   return responses.toArray(new AddErasureCodingPolicyResponse[0]);
> } finally {
>   writeUnlock(operationName);
>   if (success) {
> getEditLog().logSync();
>   }
>   logAuditEvent(success, operationName,addECPolicyName, null, null);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13217) Log audit event only used last EC policy name when add multiple policies from file

2018-08-14 Thread liaoyuxiangqin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liaoyuxiangqin updated HDFS-13217:
--
Status: Patch Available  (was: Open)

> Log audit event only used last EC policy name when add multiple policies from 
> file 
> ---
>
> Key: HDFS-13217
> URL: https://issues.apache.org/jira/browse/HDFS-13217
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Major
> Attachments: HDFS-13217.001.patch, HDFS-13217.002.patch, 
> HDFS-13217.003.patch, HDFS-13217.004.patch, HDFS-13217.005.patch
>
>
> When i read the addErasureCodingPolicies() of FSNamesystem class in namenode, 
> i found the following code only used last ec policy name for  logAuditEvent, 
> i think this audit log can't track whole policies for the add multiple 
> erasure coding policies to the ErasureCodingPolicyManager. Thanks.
> {code:java|title=FSNamesystem.java|borderStyle=solid}
> try {
>   checkOperation(OperationCategory.WRITE);
>   checkNameNodeSafeMode("Cannot add erasure coding policy");
>   for (ErasureCodingPolicy policy : policies) {
> try {
>   ErasureCodingPolicy newPolicy =
>   FSDirErasureCodingOp.addErasureCodingPolicy(this, policy,
>   logRetryCache);
>   addECPolicyName = newPolicy.getName();
>   responses.add(new AddErasureCodingPolicyResponse(newPolicy));
> } catch (HadoopIllegalArgumentException e) {
>   responses.add(new AddErasureCodingPolicyResponse(policy, e));
> }
>   }
>   success = true;
>   return responses.toArray(new AddErasureCodingPolicyResponse[0]);
> } finally {
>   writeUnlock(operationName);
>   if (success) {
> getEditLog().logSync();
>   }
>   logAuditEvent(success, operationName,addECPolicyName, null, null);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13217) Log audit event only used last EC policy name when add multiple policies from file

2018-08-14 Thread liaoyuxiangqin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liaoyuxiangqin updated HDFS-13217:
--
Status: Open  (was: Patch Available)

> Log audit event only used last EC policy name when add multiple policies from 
> file 
> ---
>
> Key: HDFS-13217
> URL: https://issues.apache.org/jira/browse/HDFS-13217
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Major
> Attachments: HDFS-13217.001.patch, HDFS-13217.002.patch, 
> HDFS-13217.003.patch, HDFS-13217.004.patch, HDFS-13217.005.patch
>
>
> When i read the addErasureCodingPolicies() of FSNamesystem class in namenode, 
> i found the following code only used last ec policy name for  logAuditEvent, 
> i think this audit log can't track whole policies for the add multiple 
> erasure coding policies to the ErasureCodingPolicyManager. Thanks.
> {code:java|title=FSNamesystem.java|borderStyle=solid}
> try {
>   checkOperation(OperationCategory.WRITE);
>   checkNameNodeSafeMode("Cannot add erasure coding policy");
>   for (ErasureCodingPolicy policy : policies) {
> try {
>   ErasureCodingPolicy newPolicy =
>   FSDirErasureCodingOp.addErasureCodingPolicy(this, policy,
>   logRetryCache);
>   addECPolicyName = newPolicy.getName();
>   responses.add(new AddErasureCodingPolicyResponse(newPolicy));
> } catch (HadoopIllegalArgumentException e) {
>   responses.add(new AddErasureCodingPolicyResponse(policy, e));
> }
>   }
>   success = true;
>   return responses.toArray(new AddErasureCodingPolicyResponse[0]);
> } finally {
>   writeUnlock(operationName);
>   if (success) {
> getEditLog().logSync();
>   }
>   logAuditEvent(success, operationName,addECPolicyName, null, null);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13217) Log audit event only used last EC policy name when add multiple policies from file

2018-08-14 Thread liaoyuxiangqin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liaoyuxiangqin updated HDFS-13217:
--
Attachment: HDFS-13217.005.patch

> Log audit event only used last EC policy name when add multiple policies from 
> file 
> ---
>
> Key: HDFS-13217
> URL: https://issues.apache.org/jira/browse/HDFS-13217
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: liaoyuxiangqin
>Assignee: liaoyuxiangqin
>Priority: Major
> Attachments: HDFS-13217.001.patch, HDFS-13217.002.patch, 
> HDFS-13217.003.patch, HDFS-13217.004.patch, HDFS-13217.005.patch
>
>
> When i read the addErasureCodingPolicies() of FSNamesystem class in namenode, 
> i found the following code only used last ec policy name for  logAuditEvent, 
> i think this audit log can't track whole policies for the add multiple 
> erasure coding policies to the ErasureCodingPolicyManager. Thanks.
> {code:java|title=FSNamesystem.java|borderStyle=solid}
> try {
>   checkOperation(OperationCategory.WRITE);
>   checkNameNodeSafeMode("Cannot add erasure coding policy");
>   for (ErasureCodingPolicy policy : policies) {
> try {
>   ErasureCodingPolicy newPolicy =
>   FSDirErasureCodingOp.addErasureCodingPolicy(this, policy,
>   logRetryCache);
>   addECPolicyName = newPolicy.getName();
>   responses.add(new AddErasureCodingPolicyResponse(newPolicy));
> } catch (HadoopIllegalArgumentException e) {
>   responses.add(new AddErasureCodingPolicyResponse(policy, e));
> }
>   }
>   success = true;
>   return responses.toArray(new AddErasureCodingPolicyResponse[0]);
> } finally {
>   writeUnlock(operationName);
>   if (success) {
> getEditLog().logSync();
>   }
>   logAuditEvent(success, operationName,addECPolicyName, null, null);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13826) Add a hidden configuration for NameNode to generate fake block locations

2018-08-14 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580755#comment-16580755
 ] 

Todd Lipcon commented on HDFS-13826:


Thanks for pointing me at Dynamometer. That's quite interesting as it's 
high-fidelity and simulates block reports, NN memory usage more accurately, 
etc. I'll take a look at whether we can use that for our use case. If not, then 
yea, the scope you described is just about it -- just a check if this setting 
is configured and calling out to some other function to generate fake results 
if so. Only a couple line hook into existing code and maybe 100 new lines 
elsewhere to do the generation.

> Add a hidden configuration for NameNode to generate fake block locations
> 
>
> Key: HDFS-13826
> URL: https://issues.apache.org/jira/browse/HDFS-13826
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
>
> In doing testing and benchmarking of the NameNode and dependent systems, it's 
> often useful to be able to use an fsimage provided by some production system 
> in a controlled environment without actually having access to any of the 
> data. For example, while doing some recent work on Apache Impala I was trying 
> to optimize the transmission and storage of block locations and tokens and 
> measure the results based on metadata from a production user. In order to 
> achieve this, it would be useful for the NN to expose a developer-only 
> (undocumented) configuration to generate fake block locations and return them 
> to callers. The "fake" locations should be randomly distributed across a 
> fixed set of fake datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2018-08-14 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580749#comment-16580749
 ] 

Yiqun Lin edited comment on HDFS-13821 at 8/15/18 6:12 AM:
---

Thanks [~ferhui] for providing the test results!
As [~ferhui] pointed that the bottleneck seems in the localcache. I look into 
the localcache instance, it uses reentrant lock (not read/write lock) for the 
thread-safe operation. So here the problem is that when multiple read/write 
operations are doing  for the cache, the cache maybe looks bad.

{quote}
Improve the locking model. From the trace Fei Hui posted, I'm guessing that the 
issue is that we are holding the write lock a lot.
{quote}
[~elgoiri], improve the locking model in MountTableResolver maybe not help us a 
lot if bottleneck is in the localcache.


was (Author: linyiqun):
Thanks [~ferhui] for providing the test results!
As [~ferhui] pointed that the bottleneck seems in the localcache. I look into 
the localcache instance, it uses reentrant lock (not read/write lock) for the 
thread-safe operation. So here the problem is that when multiple read/write 
operations are doing  for the cache, the cache maybe looks bad.

{quote}
{quote}

> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13821.001.patch, LocalCacheTest.java, 
> image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2018-08-14 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580749#comment-16580749
 ] 

Yiqun Lin commented on HDFS-13821:
--

Thanks [~ferhui] for providing the test results!
As [~ferhui] pointed that the bottleneck seems in the localcache. I look into 
the localcache instance, it uses reentrant lock (not read/write lock) for the 
thread-safe operation. So here the problem is that when multiple read/write 
operations are doing  for the cache, the cache maybe looks bad.

{quote}
{quote}

> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13821.001.patch, LocalCacheTest.java, 
> image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-14 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580737#comment-16580737
 ] 

Xiao Chen commented on HDFS-13697:
--

Thanks for the hard work on this [~zvenczel]!

bq. As these use cases are around for a while I'd expect them to be used widely 
and hard to avoid. What do you think?
This is the part that's really head-scratching. On one hand I really think 
there should be no morphing, but OTOH the existing {{TestAclsEndToEnd}} to 
'work'.

If userA creates the KMSCP, then proxies as userB and calls the KMSCP method, 
our test case is expecting this to be coming from userB. I'm not sure if we 
should deem this as a test issue - anyone knows any downstream usage this way?

[~daryn] [~xyao] what are your thoughts on this? I don't see any other code 
changes that can satisfy these cases without dynamically checking the ugi at 
method invocation time. So it seems there's no way for this to be 'compatible' 
and satisfy the 'just use the ugi at construction time', due to the historical 
morphing nature of the KMSCP...

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(H

[jira] [Updated] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread Wang XL (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang XL updated HDFS-12862:
---
Attachment: HDFS-12862-trunk.003.patch

> CacheDirective may invalidata,when NN restart or make a transition to Active.
> -
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
> Environment: 
>Reporter: Wang XL
>Priority: Major
>  Labels: patch
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread Wang XL (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang XL updated HDFS-12862:
---
Attachment: (was: HDFS-12862-trunk.003.patch)

> CacheDirective may invalidata,when NN restart or make a transition to Active.
> -
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
> Environment: 
>Reporter: Wang XL
>Priority: Major
>  Labels: patch
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread Wang XL (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang XL updated HDFS-12862:
---
Attachment: HDFS-12862-trunk.003.patch

> CacheDirective may invalidata,when NN restart or make a transition to Active.
> -
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
> Environment: 
>Reporter: Wang XL
>Priority: Major
>  Labels: patch
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread Wang XL (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580686#comment-16580686
 ] 

Wang XL commented on HDFS-12862:


Thanks  [~daryn]  for your suggestions, submit v003 following your advice and 
trigger jenkins.

> CacheDirective may invalidata,when NN restart or make a transition to Active.
> -
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
> Environment: 
>Reporter: Wang XL
>Priority: Major
>  Labels: patch
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-08-14 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580660#comment-16580660
 ] 

Yiqun Lin commented on HDFS-13671:
--

Agree with [~arpitagarwal]'s comment. I have met the same problem in 
2.6.0-cdh5.13.1 that also using the FoldedTreeSet structure.
 [~daryn], if you have already done the reverting work for this, feel free to 
attach your patch, :). I think we have reach the agreement now.

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580655#comment-16580655
 ] 

genericqa commented on HDFS-10240:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 47s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile |
|   | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
|   | hadoop.hdfs.server.mover.TestMover |
|   | hadoop.hdfs.TestPread |
|   | hadoop.hdfs.server.datanode.TestIncrementalBlockReports |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | 
hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness
 |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.TestFileCorruption |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.TestDatanodeReport |
|   | hadoop.hdfs.server.namenode.TestCacheDirectives |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-10240 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935639/HDFS-10240.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eb35db13c17c 3.13.0-

[jira] [Commented] (HDDS-265) Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to KeyValueContainerData

2018-08-14 Thread LiXin Ge (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580646#comment-16580646
 ] 

LiXin Ge commented on HDDS-265:
---

TestBuckets fails in my trunk branch,too. other fail test passed in my local 
machine.

> Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to 
> KeyValueContainerData
> -
>
> Key: HDDS-265
> URL: https://issues.apache.org/jira/browse/HDDS-265
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.2.1
>Reporter: Hanisha Koneru
>Assignee: LiXin Ge
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-265.000.patch, HDDS-265.001.patch, 
> HDDS-265.002.patch, HDDS-265.003.patch
>
>
> "numPendingDeletionBlocks" and "deleteTransactionId" fields are specific to 
> KeyValueContainers. As such they should be moved to KeyValueContainerData 
> from ContainerData.
> ContainerReport should also be refactored to take in this change. 
> Please refer to [~ljain]'s comment in HDDS-250.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13827) Distcp job is failing due to Invalid arguments

2018-08-14 Thread Sudhansu Bhuyan (JIRA)
Sudhansu Bhuyan created HDFS-13827:
--

 Summary: Distcp job is failing due to Invalid arguments
 Key: HDFS-13827
 URL: https://issues.apache.org/jira/browse/HDFS-13827
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.7.1
Reporter: Sudhansu Bhuyan


18/08/14 18:35:59 ERROR tools.DistCp: Invalid arguments:
java.lang.IllegalArgumentException: Neither source file listing nor source 
paths present
 at 
org.apache.hadoop.tools.OptionsParser.parseSourceAndTargetPaths(OptionsParser.java:348)
 at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:89)
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:117)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
Invalid arguments: Neither source file listing nor source paths present
usage: distcp OPTIONS [source_path...] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13826) Add a hidden configuration for NameNode to generate fake block locations

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580595#comment-16580595
 ] 

Wei-Chiu Chuang commented on HDFS-13826:


Hey [~tlipcon] sounds like a good proposal.
What would be the scope of this change? I imagine you just want to return a 
fake LocatedBlocks in FSNamesystem#getBlockLocations()? Or do you intend to 
have a larger change? (Just FYI LinkedIn's Dynamometer tool can generate fake 
blocks on simulated DataNodes)

> Add a hidden configuration for NameNode to generate fake block locations
> 
>
> Key: HDFS-13826
> URL: https://issues.apache.org/jira/browse/HDFS-13826
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
>
> In doing testing and benchmarking of the NameNode and dependent systems, it's 
> often useful to be able to use an fsimage provided by some production system 
> in a controlled environment without actually having access to any of the 
> data. For example, while doing some recent work on Apache Impala I was trying 
> to optimize the transmission and storage of block locations and tokens and 
> measure the results based on metadata from a production user. In order to 
> achieve this, it would be useful for the NN to expose a developer-only 
> (undocumented) configuration to generate fake block locations and return them 
> to callers. The "fake" locations should be randomly distributed across a 
> fixed set of fake datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13826) Add a hidden configuration for NameNode to generate fake block locations

2018-08-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13826:
--

 Summary: Add a hidden configuration for NameNode to generate fake 
block locations
 Key: HDFS-13826
 URL: https://issues.apache.org/jira/browse/HDFS-13826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In doing testing and benchmarking of the NameNode and dependent systems, it's 
often useful to be able to use an fsimage provided by some production system in 
a controlled environment without actually having access to any of the data. For 
example, while doing some recent work on Apache Impala I was trying to optimize 
the transmission and storage of block locations and tokens and measure the 
results based on metadata from a production user. In order to achieve this, it 
would be useful for the NN to expose a developer-only (undocumented) 
configuration to generate fake block locations and return them to callers. The 
"fake" locations should be randomly distributed across a fixed set of fake 
datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13819) TestDirectoryScanner#testDirectoryScannerInFederatedCluster is flaky

2018-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580574#comment-16580574
 ] 

Hudson commented on HDFS-13819:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14772 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14772/])
HDFS-13819. TestDirectoryScanner#testDirectoryScannerInFederatedCluster 
(templedf: rev 4a5006b1d08c19ec096b3936541672ad6a225470)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java


> TestDirectoryScanner#testDirectoryScannerInFederatedCluster is flaky
> 
>
> Key: HDFS-13819
> URL: https://issues.apache.org/jira/browse/HDFS-13819
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: HDFS-13819.001.patch, HDFS-13819.002.patch
>
>
> We're seeing the test fail periodically with:
> {quote}java.lang.AssertionError: expected:<2> but was:<1>{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10240) Race between close/recoverLease leads to missing block

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580563#comment-16580563
 ] 

Wei-Chiu Chuang commented on HDFS-10240:


Hi [~LiJinglun] thanks for your update. Sorry I didn't make myself clear. 

The HDFS-13757.test.02.patch intends to make the test more robust, and less 
likely to generate flaky failures. I attached  [^HDFS-10240.005.patch]  for 
your reference. Would you please review and let me know if it's okay?

> Race between close/recoverLease leads to missing block
> --
>
> Key: HDFS-10240
> URL: https://issues.apache.org/jira/browse/HDFS-10240
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, 
> HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240-004.patch, 
> HDFS-10240.005.patch, HDFS-10240.test.patch
>
>
> We got a missing block in our cluster, and logs related to the missing block 
> are as follows:
> 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 
> blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
>  recovery started, 
> primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]
> 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File XX has not been closed. Lease 
> recovery is in progress. RecoveryId = 153006357 for block 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,248 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, 
> primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  has not reached minimal replication 1
> 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.53:11402 is added to 
> blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  size 139
> 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:08,808 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345,
>  newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 
> 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false)
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported 
> genstamp 153006357 does not

[jira] [Updated] (HDFS-10240) Race between close/recoverLease leads to missing block

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10240:
---
Attachment: HDFS-10240.005.patch

> Race between close/recoverLease leads to missing block
> --
>
> Key: HDFS-10240
> URL: https://issues.apache.org/jira/browse/HDFS-10240
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhouyingchao
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-10240 scenarios.jpg, HDFS-10240-001.patch, 
> HDFS-10240-002.patch, HDFS-10240-003.patch, HDFS-10240-004.patch, 
> HDFS-10240.005.patch, HDFS-10240.test.patch
>
>
> We got a missing block in our cluster, and logs related to the missing block 
> are as follows:
> 2016-03-28,10:00:06,188 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: XX. BP-219149063-10.108.84.25-1446859315800 
> blk_1226490256_153006345{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,205 INFO BlockStateChange: BLOCK* 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
>  recovery started, 
> primary=ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]
> 2016-03-28,10:00:06,205 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.internalReleaseLease: File XX has not been closed. Lease 
> recovery is in progress. RecoveryId = 153006357 for block 
> blk_1226490256_153006345{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-3f5032bc-6006-4fcc-b0f7-b355a5b94f1b:NORMAL|RBW]]}
> 2016-03-28,10:00:06,248 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> checkFileProgress: blk_1226490256_153006345{blockUCState=COMMITTED, 
> primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  has not reached minimal replication 1
> 2016-03-28,10:00:06,358 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.53:11402 is added to 
> blk_1226490256_153006345{blockUCState=COMMITTED, primaryNodeIndex=2, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-bcd22774-cf4d-45e9-a6a6-c475181271c9:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-ec1413ae-5541-4b44-8922-c928be3bb306:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-85819f0d-bdbb-4a9b-b90c-eba078547c23:NORMAL|RBW]]}
>  size 139
> 2016-03-28,10:00:06,441 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.5.44:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:06,660 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: 10.114.6.14:11402 is added to blk_1226490256_153006345 size 
> 139
> 2016-03-28,10:00:08,808 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> commitBlockSynchronization(lastblock=BP-219149063-10.108.84.25-1446859315800:blk_1226490256_153006345,
>  newgenerationstamp=153006357, newlength=139, newtargets=[10.114.6.14:11402, 
> 10.114.5.53:11402, 10.114.5.44:11402], closeFile=true, deleteBlock=false)
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.6.14:11402 by /10.114.6.14 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,836 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.53:11402 by /10.114.5.53 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> 2016-03-28,10:00:08,837 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1226490256 added as corrupt on 
> 10.114.5.44:11402 by /10.114.5.44 because block is COMPLETE and reported 
> genstamp 153006357 does not match genstamp in block map 153006345
> From t

[jira] [Commented] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580528#comment-16580528
 ] 

Ajay Kumar commented on HDDS-298:
-

[~xyao] thanks for review and commit. [~msingh],[~ljain] thanks for reviews.

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580491#comment-16580491
 ] 

Hudson commented on HDDS-298:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14770 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14770/])
HDDS-298. Implement SCMClientProtocolServer.getContainerWithPipeline for (xyao: 
rev 75fc51588de33c7d1cf890f870114fd68f32fb74)
* (edit) 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerMapping.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerMapping.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/exceptions/SCMException.java
* (edit) 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/states/ContainerStateMap.java


> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580492#comment-16580492
 ] 

Allen Wittenauer commented on HDFS-13822:
-


Pre:
+1  compile 37m 44s trunk passed 

Post:
+1  compile 21m 57s the patch passed 

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDDS-298:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~ajayydv] for the contribution. I've committed the patch to trunk. 

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580413#comment-16580413
 ] 

genericqa commented on HDFS-13822:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 37m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
92m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 21m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  9m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
6s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 13s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed CTEST tests | test_test_libhdfs_threaded_hdfs_static |
|   | test_libhdfs_threaded_hdfspp_test_shim_static |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13822 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935588/HDFS-13822.01.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  javadoc  
mvninstall  shadedclient  xml  |
| uname | Linux c4f64670c7d5 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4cba074 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| CTEST | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24778/artifact/out/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-ctest.txt
 |
| unit | 
https://builds.apache.

[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580332#comment-16580332
 ] 

Hudson commented on HDFS-13788:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14769 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14769/])
HDFS-13788. Update EC documentation about rack fault tolerance. (xiao: rev 
cede33997f7ab09fc046017508b680e282289ce3)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSErasureCoding.md


> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-324) Use pipeline name as Ratis groupID to allow datanode to report pipeline info

2018-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580331#comment-16580331
 ] 

Hudson commented on HDDS-324:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14769 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14769/])
HDDS-324. Addendum: remove the q letter which is accidentally added to (elek: 
rev 7e822ec246fa78dd140192e1e800f0205aca037a)
* (edit) hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh


> Use pipeline name as Ratis groupID to allow datanode to report pipeline info
> 
>
> Key: HDDS-324
> URL: https://issues.apache.org/jira/browse/HDDS-324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-324.001.patch, HDDS-324.002.patch, 
> HDDS-324.003.patch, HDDS-324.004.patch, HDDS-324.005.patch, 
> HDDS-324.006.patch, HDDS-324.007.patch, HDDS-324.008.patch, 
> HDDS-324.009-addendum.patch, HDDS-324.009.patch
>
>
> Currently Ozone creates a random pipeline id for every pipeline where a 
> pipeline consist of 3 nodes in a ratis ring. Ratis on the other hand uses the 
> notion of RaftGroupID which is a unique id for the nodes in a ratis ring. 
> When a datanode sends information to SCM, the pipeline for the node is 
> currently identified using dn2PipelineMap. With correct use of RaftGroupID, 
> we can eliminate the use of dn2PipelineMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-08-14 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580319#comment-16580319
 ] 

Xiao Chen commented on HDFS-13770:
--

Thanks Kitti for the new rev and Zsolt for reviewing!

+1 on patch 3 pending 1 final thing:

Sorry I didn't make it clear - in general the test timeout is to prevent a 
stuck test to block the jenkins job. But because the jenkins slaves could be 
slow, the test timeout is better to be conservative so we don't get false 
negatives. So I suggest we bump the timeout to 60 seconds.

 

Since branch-2's pre-commit is pretty much broken... could you clarify what 
tests you have run for the latest patch?

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch, 
> HDFS-13770-branch-2.002.patch, HDFS-13770-branch-2.003.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or >= \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580304#comment-16580304
 ] 

Hudson commented on HDFS-13758:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14768 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14768/])
HDFS-13758. DatanodeManager should throw exception if it has (weichiu: rev 
61a9b4f58b639e71c564d84b529ac66aaae7f8ef)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java


> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580300#comment-16580300
 ] 

Xiao Chen commented on HDFS-13747:
--

Thanks Todd for creating the issue, Antal for working on it, and Gabor for the 
review!

Fix LGTM, and the failed test look unrelated. +1 pending the nit Gabor 
mentioned.

> Statistic for list_located_status is incremented incorrectly by 
> listStatusIterator
> --
>
> Key: HDFS-13747
> URL: https://issues.apache.org/jira/browse/HDFS-13747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.3
>Reporter: Todd Lipcon
>Assignee: Antal Mihalyi
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-13747.001.patch
>
>
> The DirListingIterator constructor calls 
> storageStatistics.incrementOpCounter(OpType.LIST_LOCATED_STATUS) 
> unconditionally even if 'needLocation' is false. It seems that if 
> needLocation is false, it should increment the LIST_STATUS counter instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580296#comment-16580296
 ] 

genericqa commented on HDFS-13051:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 
25s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13051 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12910788/HDFS-13112.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3c56ff56b560 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4cba074 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24777/testReport/ |
| Max. process+thread count | 3051 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24777/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
> --
>
>  

[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13758:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~candychencan]!

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580279#comment-16580279
 ] 

Xiao Chen edited comment on HDFS-13788 at 8/14/18 6:59 PM:
---

Committed to trunk and branch-3.[0-1].

 

Thanks Kitti for working on this, and Zsolt for reviewing!


was (Author: xiaochen):
Committed to trunk and branch-3.[0-1]

> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13788:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.2
   3.0.4
   3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-3.[0-1]

> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580281#comment-16580281
 ] 

Wei-Chiu Chuang commented on HDFS-13758:


Branch-2 and branch-2.9 the patch is the same as trunk because of HDFS-9371.

I've pushed up the change all the way from trunk to branch-2.9

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13758:
---
Fix Version/s: 2.9.2
   2.10.0

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-324) Use pipeline name as Ratis groupID to allow datanode to report pipeline info

2018-08-14 Thread Mukul Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580273#comment-16580273
 ] 

Mukul Kumar Singh commented on HDDS-324:


Thanks [~elek], Apologies for the inconvenience because of this. Might have 
happened because of a typo or something like that.

> Use pipeline name as Ratis groupID to allow datanode to report pipeline info
> 
>
> Key: HDDS-324
> URL: https://issues.apache.org/jira/browse/HDDS-324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-324.001.patch, HDDS-324.002.patch, 
> HDDS-324.003.patch, HDDS-324.004.patch, HDDS-324.005.patch, 
> HDDS-324.006.patch, HDDS-324.007.patch, HDDS-324.008.patch, 
> HDDS-324.009-addendum.patch, HDDS-324.009.patch
>
>
> Currently Ozone creates a random pipeline id for every pipeline where a 
> pipeline consist of 3 nodes in a ratis ring. Ratis on the other hand uses the 
> notion of RaftGroupID which is a unique id for the nodes in a ratis ring. 
> When a datanode sends information to SCM, the pipeline for the node is 
> currently identified using dn2PipelineMap. With correct use of RaftGroupID, 
> we can eliminate the use of dn2PipelineMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-324) Use pipeline name as Ratis groupID to allow datanode to report pipeline info

2018-08-14 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580274#comment-16580274
 ] 

Elek, Marton commented on HDDS-324:
---

No problem, hard to catch as not a compilation error. I just committed the fix 
to the trunk.

> Use pipeline name as Ratis groupID to allow datanode to report pipeline info
> 
>
> Key: HDDS-324
> URL: https://issues.apache.org/jira/browse/HDDS-324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-324.001.patch, HDDS-324.002.patch, 
> HDDS-324.003.patch, HDDS-324.004.patch, HDDS-324.005.patch, 
> HDDS-324.006.patch, HDDS-324.007.patch, HDDS-324.008.patch, 
> HDDS-324.009-addendum.patch, HDDS-324.009.patch
>
>
> Currently Ozone creates a random pipeline id for every pipeline where a 
> pipeline consist of 3 nodes in a ratis ring. Ratis on the other hand uses the 
> notion of RaftGroupID which is a unique id for the nodes in a ratis ring. 
> When a datanode sends information to SCM, the pipeline for the node is 
> currently identified using dn2PipelineMap. With correct use of RaftGroupID, 
> we can eliminate the use of dn2PipelineMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580276#comment-16580276
 ] 

Xiao Chen commented on HDFS-13788:
--

+1

> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13758) DatanodeManager should throw exception if it has BlockRecoveryCommand but the block is not under construction

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13758:
---
Fix Version/s: 3.1.2
   3.0.4
   3.2.0

> DatanodeManager should throw exception if it has BlockRecoveryCommand but the 
> block is not under construction
> -
>
> Key: HDFS-13758
> URL: https://issues.apache.org/jira/browse/HDFS-13758
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: chencan
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: HDFS-10240 scenarios.jpg, HDFS-13758.001.patch, 
> HDFS-13758.branch-2.patch
>
>
> In Hadoop 3, HDFS-8909 added an assertion assumption that if a 
> BlockRecoveryCommand exists for a block, the block is under construction.
>  
> {code:title=DatanodeManager#getBlockRecoveryCommand()}
>   BlockRecoveryCommand brCommand = new BlockRecoveryCommand(blocks.length);
>   for (BlockInfo b : blocks) {
> BlockUnderConstructionFeature uc = b.getUnderConstructionFeature();
> assert uc != null;
> ...
> {code}
> This assertion accidentally fixed one of the possible scenario of HDFS-10240 
> data corruption, if a recoverLease() is made immediately followed by a 
> close(), before DataNodes have the chance to heartbeat.
> In a unit test you'll get:
> {noformat}
> 2018-07-19 09:43:41,331 [IPC Server handler 9 on 57890] WARN  ipc.Server 
> (Server.java:logException(2724)) - IPC Server handler 9 on 57890, call 
> Call#41 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 127.0.0.1:57903
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getBlockRecoveryCommand(DatanodeManager.java:1551)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.handleHeartbeat(DatanodeManager.java:1661)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat(FSNamesystem.java:3865)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.sendHeartbeat(NameNodeRpcServer.java:1504)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:119)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:31660)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1689)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}
> I propose to change this assertion even though it address the data 
> corruption, because:
> # We should throw an more meaningful exception than an NPE
> # on a production cluster, the assert is ignored, and you'll get a more 
> noticeable NPE. Future HDFS developers might fix this NPE, causing 
> regression. An NPE is typically not captured and handled, so there's a chance 
> to result in internal state inconsistency.
> # It doesn't address all possible scenarios of HDFS-10240. A proper fix 
> should reject close() if the block is being recovered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-333) Create an Ozone Logo

2018-08-14 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580257#comment-16580257
 ] 

Anu Engineer commented on HDDS-333:
---

I would like to close the poll by end of this week, that is Friday, Aug,17th 
5:00 PM PST. Please make sure you vote before that. Thanks.

> Create an Ozone Logo
> 
>
> Key: HDDS-333
> URL: https://issues.apache.org/jira/browse/HDDS-333
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Anu Engineer
>Assignee: Priyanka Nagwekar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: Logo Final.zip, Logo-Ozone-Transparent-Bg.png, 
> Ozone-Logo-Options.png
>
>
> As part of developing Ozone Website and Documentation, It would be nice to 
> have an Ozone Logo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580231#comment-16580231
 ] 

Wei-Chiu Chuang commented on HDFS-13051:


CDH5 does have async edit log enabled, so I wasn't aware of this issue. But 
with the upcoming CDH6 we'll have it enabled by default. I'll give it a stab at 
reviewing this fix :)

> dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
> --
>
> Key: HDFS-13051
> URL: https://issues.apache.org/jira/browse/HDFS-13051
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.5
>Reporter: zhangwei
>Assignee: Daryn Sharp
>Priority: Major
>  Labels: AsyncEditlog, deadlock
> Attachments: HDFS-13112.patch, deadlock.patch
>
>
> when doing rolleditlog it acquires  fs write lock,then acquire FSEditLogAsync 
> lock object,and write 3 EDIT(the second one override logEdit method and 
> return true)
> in extremely case,when FSEditLogAsync's logSync is very 
> slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer 
> edit object into editPendingQ when doing rolleditlog,it block on editPendingQ 
> .put  method,however it does't release FSEditLogAsync object lock, and 
> edit.logEdit method in FSEditLogAsync.run thread can never acquire 
> FSEditLogAsync object lock, it case dead lock
> stack trace like below
> "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 
> tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395)
>  - locked <0x7fbd3cbae500> (a java.lang.Object)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656)
>  at java.lang.Thread.run(Thread.java:745)
> "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 
> nid=0x13fbf waiting for monitor entry [0x7fb32c51a000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443)
>  - waiting to lock <*0x7fbcbc131000*> (a 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177)
>  at java.lang.Thread.run(Thread.java:745)
> "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 
> tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs

[jira] [Commented] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580207#comment-16580207
 ] 

genericqa commented on HDDS-298:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
23s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-298 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935583/HDDS-298.06.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 960c6f3f93c6 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4cba074 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDDS-Build/764/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: hadoop-hdds/server-scm U: hadoop-hdds/server-scm |
| Console output | 
https://builds.apache.org/job/PreCommit-HDDS-Build/764/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-

[jira] [Commented] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580174#comment-16580174
 ] 

Xiaoyu Yao commented on HDDS-298:
-

Thanks [~ajayydv] for the update. Patch v6 LGTM, +1 pending Jenkins.

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580147#comment-16580147
 ] 

Allen Wittenauer commented on HDFS-13822:
-

One other thing: I keep meaning to optimize the OpenSSL handling code to be in 
one place instead of like 3-4 now (common, hdfs-native, pipes, one more I 
think?)

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580136#comment-16580136
 ] 

Allen Wittenauer commented on HDFS-13822:
-

-01:
* is what I'm currently using (minus some changes for yarn and mr).  has fixes 
for OS X, openssl, and some other stuff.

It doesn't use the hadoop-maven-plugin code for ctest because the 
hadoop-maven-plugin TestMojo code is not really built for large amounts of 
tests.  It basically requires listing every single test binary in its own 
execution snippet in the pom.  IIRC.

hadoop-maven-plugin should probably have a new mojo added that specifically 
calls ctest in a directory.  (It should also probably be fixed to call Windows 
cmake compatibly, especially now that cmake 3.1+ works in a sane way on 
Windows.)

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-13822:

Status: Patch Available  (was: Open)

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-13822:

Attachment: HDFS-13822.01.patch

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-347) Fix : testCloseContainerViaStandaAlone fails sometimes

2018-08-14 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580110#comment-16580110
 ] 

Xiaoyu Yao commented on HDDS-347:
-

Thanks [~GeLiXin] for the details. If we consider the order of the state change 
and the log output, should we use GenericTestUtils.waitFor the 
logCapturer.getOutput().contains the expected message first and then validate 
the isContainerClosed state? This way, the wait behavior will be deterministic 
with minimal unnecessary sleep. 

> Fix : testCloseContainerViaStandaAlone fails sometimes
> --
>
> Key: HDDS-347
> URL: https://issues.apache.org/jira/browse/HDDS-347
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: LiXin Ge
>Assignee: LiXin Ge
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-347.000.patch
>
>
> This issue was finded in the automatic JenKins unit test of HDDS-265.
>  The container life cycle state is : Open -> Closing -> closed, this test 
> submit the container close command and wait for container state change to 
> *not equal to open*, actually even when the state condition(not equal to 
> open) is satisfied, the container may still in process of closing, so the LOG 
> which will printf after the container closed can't be find sometimes and the 
> test fails.
> {code:java|title=KeyValueContainer.java|borderStyle=solid}
> try {
>   writeLock();
>   containerData.closeContainer();
>   File containerFile = getContainerFile();
>   // update the new container data to .container File
>   updateContainerFile(containerFile);
> } catch (StorageContainerException ex) {
> {code}
> Looking at the code above, the container state changes from CLOSING to CLOSED 
> in the first step, the remaining *updateContainerFile* may take hundreds of 
> milliseconds, so even we modify the test logic to wait for the *CLOSED* state 
> will not guarantee the test success, too.
>  These are two way to fix this:
>  1, Remove one of the double check which depends on the LOG.
>  2, If we have to preserve the double check, we should wait for the *CLOSED* 
> state and sleep for a while to wait for the LOG appears.
>  patch 000 is based on the second way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13825) HDFS Uses very outdated okhttp library

2018-08-14 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580101#comment-16580101
 ] 

Wei-Chiu Chuang commented on HDFS-13825:


[~rchiang] updated okhttp to 2.7.5 in HADOOP-14651.

We actually use com.squareup.okhttp3 (version 3.7.0) but in test scope. okhttp 
is used in a few source code files (mostly for oauth2). Any volunteers here to 
migrate them? If we can make sure okhttp objects are not passed in public API 
parameters, we should be okay.

> HDFS Uses very outdated okhttp library
> --
>
> Key: HDFS-13825
> URL: https://issues.apache.org/jira/browse/HDFS-13825
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.3
>Reporter: Ben Parker
>Priority: Minor
>
> HDFS Client uses okHttp library version 2.7.4 which is two years out of date.
> [https://mvnrepository.com/artifact/com.squareup.okhttp/okhttp]
> The updates for this library have been moved to a new package here:
> [https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp]
>  
> This causes dependancy management problems for services that use HDFS.
> For example trying to use okHttp in code that runs on Amazon EMR gives you 
> Method not found errors due to the new version being kicked out in favour of 
> the one used by HDFS.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580084#comment-16580084
 ] 

Ajay Kumar commented on HDDS-298:
-

[~xyao] thanks for review.
{code}
ContainerMapping.java
Line 79: NIT: CLOSE->CLOSED, Close-pipeline- => Closed-pipeline-{code}
With HDDS-324 we have replaced string based pipeline name with UUID based 
pipelineId. So removed this name field.
 {code}
Line 206-208: NIT: unrelated change {code}
removed

{code}
Line 214 getContainerReplica() returns a immutable set, we can avoid allocate a 
new ArrayList for the datanodes.{code}
done!
{code}
Line 215-217: Can we fold this with a new API 
containerStateManager#getContainerReplica() to avoid expose the complete 
containerStateMap here?
{code}
Done. We already have an API, i think i missed it initially.

 {code}
Line 221: can we define a more specific error code here?
{code}
Added NO_REPLICA_FOUND in ResultCodes but removed these lines as 
ContainerStateMap#getContainerReplicas already throws an SCM exception if no 
replicas are found. Updated test case to check for this exception.

{code}
Line 224: should we use a different replication type here for closed 
containers?{code}
I was thinking closed container might already have STANDALONE type. Changed it 
to STANDALONE explicitly.

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580079#comment-16580079
 ] 

genericqa commented on HDFS-12862:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}175m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-12862 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935552/HDFS-12862-trunk.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 15dcf100971f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d1830d8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24775/artifact/out/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24775/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24775/testReport/ |
| Max. process+thread count | 3023 (vs. ulimit of 1) |

[jira] [Updated] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-298:

Attachment: (was: HDDS-298.06.patch)

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-298:

Attachment: HDDS-298.06.patch

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-298) Implement SCMClientProtocolServer.getContainerWithPipeline for closed containers

2018-08-14 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-298:

Attachment: HDDS-298.06.patch

> Implement SCMClientProtocolServer.getContainerWithPipeline for closed 
> containers
> 
>
> Key: HDDS-298
> URL: https://issues.apache.org/jira/browse/HDDS-298
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Elek, Marton
>Assignee: Ajay Kumar
>Priority: Critical
> Fix For: 0.2.1
>
> Attachments: HDDS-298.00.patch, HDDS-298.01.patch, HDDS-298.02.patch, 
> HDDS-298.03.patch, HDDS-298.04.patch, HDDS-298.05.patch, HDDS-298.06.patch
>
>
> As [~ljain] mentioned during the review of HDDS-245 
> SCMClientProtocolServer.getContainerWithPipeline doesn't return with good 
> data for closed containers. For closed containers we are maintaining the 
> datanodes for a containerId in the ContainerStateMap.contReplicaMap. We need 
> to create fake Pipeline object on-request and return it for the client to 
> locate the right datanodes to download data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-08-14 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580032#comment-16580032
 ] 

Arpit Agarwal commented on HDFS-13671:
--

Reverting seems to be the right answer. It will be non-trivial so we'll need a 
brave volunteer to work through the conflicts.


> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-08-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580028#comment-16580028
 ] 

Daryn Sharp commented on HDFS-13671:


I think we all know the time complexity to update a large balanced binary tree 
cannot possibly compete with updating a small linked list (triplets) of n-many 
elements (replication factor).  It seems like we are in agreement to revert?  
I'd do it but I've wrecked my local repo too many times to trust myself. :)

 

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13746) Still occasional "Should be different group" failure in TestRefreshUserMappings#testGroupMappingRefresh

2018-08-14 Thread Siyao Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580016#comment-16580016
 ] 

Siyao Meng commented on HDFS-13746:
---

+1 jenkins. Unrelated flaky tests.

> Still occasional "Should be different group" failure in 
> TestRefreshUserMappings#testGroupMappingRefresh
> ---
>
> Key: HDFS-13746
> URL: https://issues.apache.org/jira/browse/HDFS-13746
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-13746.001.patch, HDFS-13746.002.patch, 
> HDFS-13746.003.patch, HDFS-13746.004.patch, HDFS-13746.005.patch
>
>
> In https://issues.apache.org/jira/browse/HDFS-13723, increasing the amount of 
> time in sleep() helps but the problem still appears, which is annoying.
>  
> Solution:
> Use a loop to allow the test case to fail maxTrials times before declaring 
> failure. Wait 50 ms between each retry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full

2018-08-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580011#comment-16580011
 ] 

Daryn Sharp commented on HDFS-13051:


No brave reviewers?

> dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
> --
>
> Key: HDFS-13051
> URL: https://issues.apache.org/jira/browse/HDFS-13051
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.5
>Reporter: zhangwei
>Assignee: Daryn Sharp
>Priority: Major
>  Labels: AsyncEditlog, deadlock
> Attachments: HDFS-13112.patch, deadlock.patch
>
>
> when doing rolleditlog it acquires  fs write lock,then acquire FSEditLogAsync 
> lock object,and write 3 EDIT(the second one override logEdit method and 
> return true)
> in extremely case,when FSEditLogAsync's logSync is very 
> slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer 
> edit object into editPendingQ when doing rolleditlog,it block on editPendingQ 
> .put  method,however it does't release FSEditLogAsync object lock, and 
> edit.logEdit method in FSEditLogAsync.run thread can never acquire 
> FSEditLogAsync object lock, it case dead lock
> stack trace like below
> "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 
> tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395)
>  - locked <0x7fbd3cbae500> (a java.lang.Object)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656)
>  at java.lang.Thread.run(Thread.java:745)
> "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 
> nid=0x13fbf waiting for monitor entry [0x7fb32c51a000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443)
>  - waiting to lock <*0x7fbcbc131000*> (a 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177)
>  at java.lang.Thread.run(Thread.java:745)
> "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 
> tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251)
>  - locked <*0x7fbcbc131000*> (a 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync)
>  

[jira] [Commented] (HDDS-324) Use pipeline name as Ratis groupID to allow datanode to report pipeline info

2018-08-14 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580006#comment-16580006
 ] 

Xiaoyu Yao commented on HDDS-324:
-

Good catch, [~elek]. My bad, I noticed that when committing the patch but 
misread the patch as removing the extra "q" from hadoop-function.sh. 

> Use pipeline name as Ratis groupID to allow datanode to report pipeline info
> 
>
> Key: HDDS-324
> URL: https://issues.apache.org/jira/browse/HDDS-324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-324.001.patch, HDDS-324.002.patch, 
> HDDS-324.003.patch, HDDS-324.004.patch, HDDS-324.005.patch, 
> HDDS-324.006.patch, HDDS-324.007.patch, HDDS-324.008.patch, 
> HDDS-324.009-addendum.patch, HDDS-324.009.patch
>
>
> Currently Ozone creates a random pipeline id for every pipeline where a 
> pipeline consist of 3 nodes in a ratis ring. Ratis on the other hand uses the 
> notion of RaftGroupID which is a unique id for the nodes in a ratis ring. 
> When a datanode sends information to SCM, the pipeline for the node is 
> currently identified using dn2PipelineMap. With correct use of RaftGroupID, 
> we can eliminate the use of dn2PipelineMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579996#comment-16579996
 ] 

genericqa commented on HDFS-13747:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
34s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}210m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13747 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935533/HDFS-13747.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b490fba3682e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d1830d8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreComm

[jira] [Comment Edited] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Pradeep Ambati (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579958#comment-16579958
 ] 

Pradeep Ambati edited comment on HDFS-13822 at 8/14/18 3:58 PM:


When I said "ctest for unit tests didn't work", I mean that I couldn't 
trigger/run ctest for unit tests in the first place.

[~aw] Can you help me with configuring cmake-compile goal definitions in the 
respective pom to run ctest for unit tests?


was (Author: pradeepambati):
When I said "ctest for unit tests didn't work", I mean that I couldn't 
trigger/run ctest for unit tests in the first place.

 

[~aw] Can you help me with configuring cmake-compile goal definitions in the 
respective pom to run ctest for unit tests?

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-342) Add example byteman script to print out hadoop rpc traffic

2018-08-14 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579975#comment-16579975
 ] 

Anu Engineer commented on HDDS-342:
---

+1 on the github URL. Just a thought, can we use "file://..." as the URL ? does 
it not work?

> Add example byteman script to print out hadoop rpc traffic
> --
>
> Key: HDDS-342
> URL: https://issues.apache.org/jira/browse/HDDS-342
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Minor
> Fix For: 0.2.1
>
> Attachments: HDDS-342.001.patch, byteman.png, byteman2.png
>
>
> HADOOP-15656 adds byteman support to the hadoop-runner base image. byteman is 
> a simple tool to define java instrumentation. For example it's very easy to 
> print out the incoming and outgoing hadoop rcp messages or fsimage edits.
> In this patch I add one more line to the standard docker-compose cluster to 
> demonstrate this capability (print out rpc calls). By default it's turned off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13825) HDFS Uses very outdated okhttp library

2018-08-14 Thread Ben Parker (JIRA)
Ben Parker created HDFS-13825:
-

 Summary: HDFS Uses very outdated okhttp library
 Key: HDFS-13825
 URL: https://issues.apache.org/jira/browse/HDFS-13825
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.3
Reporter: Ben Parker


HDFS Client uses okHttp library version 2.7.4 which is two years out of date.

[https://mvnrepository.com/artifact/com.squareup.okhttp/okhttp]

The updates for this library have been moved to a new package here:

[https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp]

 

This causes dependancy management problems for services that use HDFS.

For example trying to use okHttp in code that runs on Amazon EMR gives you 
Method not found errors due to the new version being kicked out in favour of 
the one used by HDFS.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-324) Use pipeline name as Ratis groupID to allow datanode to report pipeline info

2018-08-14 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579957#comment-16579957
 ] 

Elek, Marton commented on HDDS-324:
---

A q letter is added to hadoop-function.sh (which makes ozone classpath wrong). 
I will revert it with the attached HDDS-324.009-addendum.patch


> Use pipeline name as Ratis groupID to allow datanode to report pipeline info
> 
>
> Key: HDDS-324
> URL: https://issues.apache.org/jira/browse/HDDS-324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-324.001.patch, HDDS-324.002.patch, 
> HDDS-324.003.patch, HDDS-324.004.patch, HDDS-324.005.patch, 
> HDDS-324.006.patch, HDDS-324.007.patch, HDDS-324.008.patch, 
> HDDS-324.009-addendum.patch, HDDS-324.009.patch
>
>
> Currently Ozone creates a random pipeline id for every pipeline where a 
> pipeline consist of 3 nodes in a ratis ring. Ratis on the other hand uses the 
> notion of RaftGroupID which is a unique id for the nodes in a ratis ring. 
> When a datanode sends information to SCM, the pipeline for the node is 
> currently identified using dn2PipelineMap. With correct use of RaftGroupID, 
> we can eliminate the use of dn2PipelineMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2018-08-14 Thread Fei Hui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HDFS-13821:
---
Attachment: LocalCacheTest.java

> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13821.001.patch, LocalCacheTest.java, 
> image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache, ProxyAvgTime is down as follow showed
>  !image-2018-08-13-11-27-49-023.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Pradeep Ambati (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579958#comment-16579958
 ] 

Pradeep Ambati commented on HDFS-13822:
---

When I said "ctest for unit tests didn't work", I mean that I couldn't 
trigger/run ctest for unit tests in the first place.

 

[~aw] Can you help me with configuring cmake-compile goal definitions in the 
respective pom to run ctest for unit tests?

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-324) Use pipeline name as Ratis groupID to allow datanode to report pipeline info

2018-08-14 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-324:
--
Attachment: HDDS-324.009-addendum.patch

> Use pipeline name as Ratis groupID to allow datanode to report pipeline info
> 
>
> Key: HDDS-324
> URL: https://issues.apache.org/jira/browse/HDDS-324
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.2.1
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-324.001.patch, HDDS-324.002.patch, 
> HDDS-324.003.patch, HDDS-324.004.patch, HDDS-324.005.patch, 
> HDDS-324.006.patch, HDDS-324.007.patch, HDDS-324.008.patch, 
> HDDS-324.009-addendum.patch, HDDS-324.009.patch
>
>
> Currently Ozone creates a random pipeline id for every pipeline where a 
> pipeline consist of 3 nodes in a ratis ring. Ratis on the other hand uses the 
> notion of RaftGroupID which is a unique id for the nodes in a ratis ring. 
> When a datanode sends information to SCM, the pipeline for the node is 
> currently identified using dn2PipelineMap. With correct use of RaftGroupID, 
> we can eliminate the use of dn2PipelineMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13821) RBF: Add dfs.federation.router.mount-table.cache.enable so that users can disable cache

2018-08-14 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579955#comment-16579955
 ] 

Fei Hui commented on HDFS-13821:


[~elgoiri] Thanks for your reply.
{quote}
The hit ratio of the cache over time
{quote}
The hit ratio is nearyly zero in my test case
{quote}
The stats on the read/write lock
{quote}
Maybe it's unrelated to the read/write lock, because my approach code is in 
read lock block as follow
{code:java}
  @Override
  public PathLocation getDestinationForPath(final String path)
  throws IOException {
verifyMountTable();
readLock.lock();
try {
  if (!mountTableCacheEnable) {
return lookupLocation(path);
  }
  Callable meh = new Callable() {
@Override
public PathLocation call() throws Exception {
  return lookupLocation(path);
}
  };
  return this.locationCache.get(path, meh);
} catch (ExecutionException e) {
  throw new IOException(e);
} finally {
  readLock.unlock();
}
  }
{code}
{quote}
The time for a hit or a miss
{quote}
I do not stat it, but i am guessing localcache is the bottleneck in my test 
case and i do a test. 
The test code is uploaded *LocalCacheTest.java*. i run 'hadoop jar 
test-1.0-SNAPSHOT.jar LocalCacheTest 1 1024 1000' which means cachesize is 
1, threads number is 1024 and 1000 get ops each thread. The result : 
elapse:24.555 ms per op each thread.
There is a lock in localcache. I think In my test case 1024 concurrent threads 
computing is better than one thread holds the write lock and other threads 
blocked.


> RBF: Add dfs.federation.router.mount-table.cache.enable so that users can 
> disable cache
> ---
>
> Key: HDFS-13821
> URL: https://issues.apache.org/jira/browse/HDFS-13821
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0, 2.9.1, 3.0.3
>Reporter: Fei Hui
>Priority: Major
> Attachments: HDFS-13821.001.patch, image-2018-08-13-11-27-49-023.png
>
>
> When i test rbf, if found performance problem.
> I found that ProxyAvgTime From Ganglia is so high, i run jstack on Router and 
> get the following stack frames
> {quote}
>    java.lang.Thread.State: WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x0005c264acd8> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>     at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>     at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>     at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2249)
>     at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
>     at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
>     at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
>     at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:380)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2104)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:2087)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getListing(RouterRpcServer.java:1050)
>     at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:640)
>     at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2115)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2111)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
> {quote}
> Many threads blocked on *LocalCache*
> After disable the cache,

[jira] [Commented] (HDFS-13772) Erasure coding: Unnecessary NameNode Logs displaying for Enabling/Disabling Erasure coding policies which are already enabled/disabled

2018-08-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579931#comment-16579931
 ] 

Daryn Sharp commented on HDFS-13772:


Isn't modifying the method signature a binary incompatible change?  Suppressing 
redundant log lines and edits seems reasonable though.  Propagating result back 
to the client, probably not if it's incompatible.

> Erasure coding: Unnecessary NameNode Logs displaying for Enabling/Disabling 
> Erasure coding policies which are already enabled/disabled
> --
>
> Key: HDFS-13772
> URL: https://issues.apache.org/jira/browse/HDFS-13772
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
> Environment: 3 Node SuSE Linux cluster 
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Trivial
> Attachments: EC_capture1.PNG, HDFS-13772-01.patch
>
>
> Unnecessary NameNode Logs displaying for Enabling/Disabling Erasure coding 
> policies which are already enabled/disabled
> - Enable any Erasure coding policy like "RS-LEGACY-6-3-1024k"
> - Check the console log display as "Erasure coding policy RS-LEGACY-6-3-1024k 
> is enabled"
> - Again try to enable the same policy multiple times "hdfs ec -enablePolicy 
> -policy RS-LEGACY-6-3-1024k"
>  instead of throwing error message as ""policy already enabled"" it will 
> display same messages as "Erasure coding policy RS-LEGACY-6-3-1024k is 
> enabled"
> - Also in NameNode log policy enabled logs are displaying multiple times 
> unnecessarily even though the policy is already enabled.
>  like this : 2018-07-27 18:50:35,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager: Disable 
> the erasure coding policy RS-10-4-1024k
> 2018-07-27 18:50:35,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager: Disable 
> the erasure coding policy RS-10-4-1024k
> 2018-07-27 18:50:35,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager: Disable 
> the erasure coding policy RS-10-4-1024k
> 2018-07-27 18:50:35,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager: Enable the 
> erasure coding policy RS-LEGACY-6-3-1024k
> 2018-07-27 18:50:35,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager: Enable the 
> erasure coding policy RS-LEGACY-6-3-1024k
> 2018-07-27 18:50:35,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.ErasureCodingPolicyManager: Enable the 
> erasure coding policy RS-LEGACY-6-3-1024k
> - While executing the Erasure coding policy disable command also same type of 
> logs coming multiple times even though the policy is already 
>  disabled.It should throw error message as ""policy is already disabled"" for 
> already disabled policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579903#comment-16579903
 ] 

Daryn Sharp commented on HDFS-12862:


Rather than rebuild the object, why not make the serialization routines be 
symmetrical?  Since {{readCacheDirectiveInfo}} interprets as an absolute, 
shouldn't {{writeCacheDirectiveInfo}} (which is just a few lines above it) do 
the same?  Ex.
{code:java}
@@ -538,7 +538,7 @@ public static void writeCacheDirectiveInfo(DataOutputStream 
out,
       writeString(directive.getPool(), out);
     }
     if (directive.getExpiration() != null) {
-      writeLong(directive.getExpiration().getMillis(), out);
+      writeLong(directive.getExpiration().getAbsoluteMillis(), out);
     }
   }{code}

> CacheDirective may invalidata,when NN restart or make a transition to Active.
> -
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
> Environment: 
>Reporter: Wang XL
>Priority: Major
>  Labels: patch
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-14 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579892#comment-16579892
 ] 

Allen Wittenauer commented on HDFS-13822:
-

As reported by the qbt nightly runs, be aware that ctests for libhdfspp have 
been broken since ~ June 29th.  Likely caused by one of:

{code}
[Jun 28, 2018 5:37:22 AM] (aajisaka) HADOOP-15495. Upgrade commons-lang version 
to 3.7 in
[Jun 28, 2018 5:58:40 AM] (aajisaka) HADOOP-14313. Replace/improve Hadoop's 
byte[] comparator. Contributed by
[Jun 28, 2018 6:39:33 AM] (aengineer) HDDS-195. Create generic CommandWatcher 
utility. Contributed by Elek,
[Jun 28, 2018 4:21:56 PM] (Bharat) HDFS-13705:The native ISA-L library loading 
failure should be made
[Jun 28, 2018 4:39:49 PM] (eyang) YARN-8409.  Fixed NPE in 
ActiveStandbyElectorBasedElectorService.   
[Jun 28, 2018 5:23:31 PM] (sunilg) YARN-8379. Improve balancing resources in 
already satisfied queues by
[Jun 28, 2018 10:41:39 PM] (nanda) HDDS-185: 
TestCloseContainerByPipeline#testCloseContainerViaRatis fail
[Jun 28, 2018 11:07:16 PM] (nanda) HDDS-178: DN should update transactionId on 
block delete. Contributed by
{code}

So be sure your failures are actually related to the patch.

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Priority: Minor
> Attachments: HDFS-13382.000.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12862) CacheDirective may invalidata,when NN restart or make a transition to Active.

2018-08-14 Thread Wang XL (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang XL updated HDFS-12862:
---
Attachment: HDFS-12862-trunk.002.patch

> CacheDirective may invalidata,when NN restart or make a transition to Active.
> -
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
> Environment: 
>Reporter: Wang XL
>Priority: Major
>  Labels: patch
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-265) Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to KeyValueContainerData

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579805#comment-16579805
 ] 

genericqa commented on HDDS-265:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
56s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m  7s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline
 |
|   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
|   | hadoop.ozone.freon.TestDataValidate |
|   | hadoop.ozone.web.client.TestBuckets |
|   | hadoop.ozone.web.client.TestKeys |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDDS-265 |
| JIRA Patch URL | 
https://issues.apache.org/j

[jira] [Commented] (HDFS-13772) Erasure coding: Unnecessary NameNode Logs displaying for Enabling/Disabling Erasure coding policies which are already enabled/disabled

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579802#comment-16579802
 ] 

genericqa commented on HDFS-13772:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 15m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
37s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 35s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 
35s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}224m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.cli.TestErasureCodingCLI |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDF

[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579790#comment-16579790
 ] 

genericqa commented on HDFS-13788:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
40m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13788 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935531/HDFS-13788.002.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 8cba93d3b207 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d1830d8 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 312 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24773/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13805) Journal Nodes should allow to format non-empty directories with "-force" option

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579775#comment-16579775
 ] 

genericqa commented on HDFS-13805:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 10 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
47s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 27m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 36s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
37s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}217m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | HDFS-13805 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12935510/HDFS-13805.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 478d08319222 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d1830d8 |
| maven | version:

[jira] [Comment Edited] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579748#comment-16579748
 ] 

Gabor Bota edited comment on HDFS-13747 at 8/14/18 12:46 PM:
-

Thanks for the patch [~amihalyi]!
The patch looks good - just a little nitpick: you've removed some tabs from  
TestDistributedFileSystem:726 which was not necessary, otherwise +1.


was (Author: gabor.bota):
Thanks for the patch [~amihalyi]!
The patch looks good - just a little nitpick: you've removed some tabs from 
line 726 which was not necessary, otherwise +1.

> Statistic for list_located_status is incremented incorrectly by 
> listStatusIterator
> --
>
> Key: HDFS-13747
> URL: https://issues.apache.org/jira/browse/HDFS-13747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.3
>Reporter: Todd Lipcon
>Assignee: Antal Mihalyi
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-13747.001.patch
>
>
> The DirListingIterator constructor calls 
> storageStatistics.incrementOpCounter(OpType.LIST_LOCATED_STATUS) 
> unconditionally even if 'needLocation' is false. It seems that if 
> needLocation is false, it should increment the LIST_STATUS counter instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-342) Add example byteman script to print out hadoop rpc traffic

2018-08-14 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579752#comment-16579752
 ] 

Elek, Marton commented on HDDS-342:
---

Thanks the feedback [~anu]. It was just an example script but I like your idea 
to collect useful byteman scripts in the hadoop repository.

Using file from the repository could be more complex than using an url: it 
requires not just adding one additional environment variable, but we need to 
modify the mounts for all the containers (mount the folder with the scripts) 
AND change the environment variables.

I would add the example script to the repository AND use an url to 
https://github.com/apache/hadoop/trunk/... In that case url could be fixed 
easily in case of change (script is part of the repo) but the usage still just 
onliner... What do you think?   

> Add example byteman script to print out hadoop rpc traffic
> --
>
> Key: HDDS-342
> URL: https://issues.apache.org/jira/browse/HDDS-342
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Minor
> Fix For: 0.2.1
>
> Attachments: HDDS-342.001.patch, byteman.png, byteman2.png
>
>
> HADOOP-15656 adds byteman support to the hadoop-runner base image. byteman is 
> a simple tool to define java instrumentation. For example it's very easy to 
> print out the incoming and outgoing hadoop rcp messages or fsimage edits.
> In this patch I add one more line to the standard docker-compose cluster to 
> demonstrate this capability (print out rpc calls). By default it's turned off.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579748#comment-16579748
 ] 

Gabor Bota commented on HDFS-13747:
---

Thanks for the patch [~amihalyi]!
The patch looks good - just a little nitpick: you've removed some tabs from 
line 726 which was not necessary, otherwise +1.

> Statistic for list_located_status is incremented incorrectly by 
> listStatusIterator
> --
>
> Key: HDFS-13747
> URL: https://issues.apache.org/jira/browse/HDFS-13747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.3
>Reporter: Todd Lipcon
>Assignee: Antal Mihalyi
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-13747.001.patch
>
>
> The DirListingIterator constructor calls 
> storageStatistics.incrementOpCounter(OpType.LIST_LOCATED_STATUS) 
> unconditionally even if 'needLocation' is false. It seems that if 
> needLocation is false, it should increment the LIST_STATUS counter instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread Antal Mihalyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Mihalyi updated HDFS-13747:
-
Attachment: (was: HDFS-13747.001.patch)

> Statistic for list_located_status is incremented incorrectly by 
> listStatusIterator
> --
>
> Key: HDFS-13747
> URL: https://issues.apache.org/jira/browse/HDFS-13747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.3
>Reporter: Todd Lipcon
>Assignee: Antal Mihalyi
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-13747.001.patch
>
>
> The DirListingIterator constructor calls 
> storageStatistics.incrementOpCounter(OpType.LIST_LOCATED_STATUS) 
> unconditionally even if 'needLocation' is false. It seems that if 
> needLocation is false, it should increment the LIST_STATUS counter instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-325) Add event watcher for delete blocks command

2018-08-14 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579741#comment-16579741
 ] 

Elek, Marton commented on HDDS-325:
---

If I understood well (fix me, please, if I am wrong) this class does the same. 
Just send anything to the inputDestination and it will be sent to the 
realDestination with retry.

{code}
public class EventSupervisor implements
EventHandler {
  private final Event realDestination;
  private final Event inputDestination;
  private EventWatcher watcher;
  public EventSupervisor(
  Event inputDestination,
  Event realDestination,
  Event completionEvent,
  LeaseManager leaseManager) {
this.realDestination = realDestination;
this.inputDestination = inputDestination;
this.watcher =
new EventWatcher(inputDestination,
completionEvent, leaseManager) {
  @Override
  protected void onTimeout(EventPublisher publisher,
  PAYLOAD payload) {
publisher.fireEvent(realDestination, payload);
  }
  @Override
  protected void onFinished(EventPublisher publisher,
  PAYLOAD payload) {
  }
};
  }
  public void start(EventQueue queue) {
queue.addHandler(inputDestination, this);
watcher.start(queue);
  }
  @Override
  public void onMessage(PAYLOAD payload, EventPublisher publisher) {
publisher.fireEvent(realDestination, payload);
  }
}

{code}

> Add event watcher for delete blocks command
> ---
>
> Key: HDDS-325
> URL: https://issues.apache.org/jira/browse/HDDS-325
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-325.001.patch, HDDS-325.002.patch, 
> HDDS-325.003.patch
>
>
> This Jira aims to add watcher for deleteBlocks command. It removes the 
> current rpc call required for datanode to send the acknowledgement for 
> deleteBlocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread Antal Mihalyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Mihalyi updated HDFS-13747:
-
Attachment: HDFS-13747.001.patch
Status: Patch Available  (was: Open)

> Statistic for list_located_status is incremented incorrectly by 
> listStatusIterator
> --
>
> Key: HDFS-13747
> URL: https://issues.apache.org/jira/browse/HDFS-13747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.3
>Reporter: Todd Lipcon
>Assignee: Antal Mihalyi
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-13747.001.patch, HDFS-13747.001.patch
>
>
> The DirListingIterator constructor calls 
> storageStatistics.incrementOpCounter(OpType.LIST_LOCATED_STATUS) 
> unconditionally even if 'needLocation' is false. It seems that if 
> needLocation is false, it should increment the LIST_STATUS counter instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579720#comment-16579720
 ] 

Kitti Nanasi commented on HDFS-13788:
-

Thanks for the comment, [~xiaochen]! I fixed the description in patch v002.

> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-14 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kitti Nanasi updated HDFS-13788:

Attachment: HDFS-13788.002.patch

> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13788.001.patch, HDFS-13788.002.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-08-14 Thread Antal Mihalyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Mihalyi updated HDFS-13747:
-
Attachment: HDFS-13747.001.patch

> Statistic for list_located_status is incremented incorrectly by 
> listStatusIterator
> --
>
> Key: HDFS-13747
> URL: https://issues.apache.org/jira/browse/HDFS-13747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.3
>Reporter: Todd Lipcon
>Assignee: Antal Mihalyi
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-13747.001.patch
>
>
> The DirListingIterator constructor calls 
> storageStatistics.incrementOpCounter(OpType.LIST_LOCATED_STATUS) 
> unconditionally even if 'needLocation' is false. It seems that if 
> needLocation is false, it should increment the LIST_STATUS counter instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13810) RBF: Adding the mount entry without having the destination path, its getting added into the mount table by taking the other parameters order as destination path.

2018-08-14 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579668#comment-16579668
 ] 

Yiqun Lin commented on HDFS-13810:
--

This issue is similar to HDFS-13815. I prefer to use 
{{org.apache.commons.cli.CommandLineParser}} to do the option-parsing behavior 
here.

>  RBF: Adding the mount entry without having the destination path, its getting 
>  added into the mount table by taking the other parameters order as 
> destination path.
> ---
>
> Key: HDFS-13810
> URL: https://issues.apache.org/jira/browse/HDFS-13810
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0
>Reporter: venkata ram kumar ch
>Assignee: venkata ram kumar ch
>Priority: Minor
>
> In Router based federation when we  try to add the mount entry without having 
> the destination path, its getting  added into the mount table by taking the 
> other parameters order as destination path.
> Command : hdfs dfsrouteradmin -add /aaa ns1  -order RANDOM 
> its creating a mount entry by taking -order as target path



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-265) Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to KeyValueContainerData

2018-08-14 Thread LiXin Ge (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579660#comment-16579660
 ] 

LiXin Ge commented on HDDS-265:
---

[~ljain] Thanks for your further comments.

bq. 1.We can use default keyword to provide a default implementation.
Done in the v003 patch.

bq.2.RandomContainerDeletionChoosingPolicy:59,60 - The change should be reverted
Done in the v003 patch.

bq.3.There is a compilation failure after applying the patch
It's something with HDDS-308 which merged recently. Rebased in the v003 patch.

bq.4.I think we can have a getContainerReport api in Container.java?
Agree, it should be a global function for all kinds of container! Done in the 
v003 patch.


> Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to 
> KeyValueContainerData
> -
>
> Key: HDDS-265
> URL: https://issues.apache.org/jira/browse/HDDS-265
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.2.1
>Reporter: Hanisha Koneru
>Assignee: LiXin Ge
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-265.000.patch, HDDS-265.001.patch, 
> HDDS-265.002.patch, HDDS-265.003.patch
>
>
> "numPendingDeletionBlocks" and "deleteTransactionId" fields are specific to 
> KeyValueContainers. As such they should be moved to KeyValueContainerData 
> from ContainerData.
> ContainerReport should also be refactored to take in this change. 
> Please refer to [~ljain]'s comment in HDDS-250.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-265) Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to KeyValueContainerData

2018-08-14 Thread LiXin Ge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge updated HDDS-265:
--
Attachment: HDDS-265.003.patch

> Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to 
> KeyValueContainerData
> -
>
> Key: HDDS-265
> URL: https://issues.apache.org/jira/browse/HDDS-265
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.2.1
>Reporter: Hanisha Koneru
>Assignee: LiXin Ge
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-265.000.patch, HDDS-265.001.patch, 
> HDDS-265.002.patch, HDDS-265.003.patch
>
>
> "numPendingDeletionBlocks" and "deleteTransactionId" fields are specific to 
> KeyValueContainers. As such they should be moved to KeyValueContainerData 
> from ContainerData.
> ContainerReport should also be refactored to take in this change. 
> Please refer to [~ljain]'s comment in HDDS-250.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-265) Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to KeyValueContainerData

2018-08-14 Thread LiXin Ge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge updated HDDS-265:
--
Status: Patch Available  (was: Open)

> Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to 
> KeyValueContainerData
> -
>
> Key: HDDS-265
> URL: https://issues.apache.org/jira/browse/HDDS-265
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.2.1
>Reporter: Hanisha Koneru
>Assignee: LiXin Ge
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-265.000.patch, HDDS-265.001.patch, 
> HDDS-265.002.patch, HDDS-265.003.patch
>
>
> "numPendingDeletionBlocks" and "deleteTransactionId" fields are specific to 
> KeyValueContainers. As such they should be moved to KeyValueContainerData 
> from ContainerData.
> ContainerReport should also be refactored to take in this change. 
> Please refer to [~ljain]'s comment in HDDS-250.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-265) Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to KeyValueContainerData

2018-08-14 Thread LiXin Ge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LiXin Ge updated HDDS-265:
--
Status: Open  (was: Patch Available)

> Move numPendingDeletionBlocks and deleteTransactionId from ContainerData to 
> KeyValueContainerData
> -
>
> Key: HDDS-265
> URL: https://issues.apache.org/jira/browse/HDDS-265
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.2.1
>Reporter: Hanisha Koneru
>Assignee: LiXin Ge
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-265.000.patch, HDDS-265.001.patch, 
> HDDS-265.002.patch
>
>
> "numPendingDeletionBlocks" and "deleteTransactionId" fields are specific to 
> KeyValueContainers. As such they should be moved to KeyValueContainerData 
> from ContainerData.
> ContainerReport should also be refactored to take in this change. 
> Please refer to [~ljain]'s comment in HDDS-250.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-247) Handle CLOSED_CONTAINER_IO exception in ozoneClient

2018-08-14 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579651#comment-16579651
 ] 

genericqa commented on HDDS-247:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
35s{color} | {color:green} client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 48s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion |
|   | hadoop.ozone.om.TestContainerReportWithKeys |
|   | hadoop.ozone.container.ozoneimpl.TestOzoneContainer |
|   | h

[jira] [Commented] (HDFS-13815) RBF: Add check to order command

2018-08-14 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579621#comment-16579621
 ] 

Yiqun Lin commented on HDFS-13815:
--

The invalid input option name should be checked and print error. Feel free to 
attach your patch, :).

> RBF: Add check to order command
> ---
>
> Key: HDFS-13815
> URL: https://issues.apache.org/jira/browse/HDFS-13815
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Ranith Sardar
>Priority: Minor
>
> No check being done on order command.
> It says successfully updated mount table if we don't specify order command 
> and it is not updated in mount table
> Execute the dfsrouter update command with the below scenarios.
> 1. ./hdfs dfsrouteradmin -update /apps3 hacluster,ns2 /tmp6 RANDOM
> 2. ./hdfs dfsrouteradmin -update /apps3 hacluster,ns2 /tmp6 -or RANDOM
> 3. ./hdfs dfsrouteradmin -update /apps3 hacluster,ns2 /tmp6  -ord RANDOM
> 4. ./hdfs dfsrouteradmin -update /apps3 hacluster,ns2 /tmp6  -orde RANDOM
>  
> The console message says, Successfully updated mount point. But it is not 
> updated in the mount table.
>  
> Expected Result:
> Exception on console as the order command is missing/not written properl



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13031) To detect fsimage corruption on the spot

2018-08-14 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579615#comment-16579615
 ] 

Gabor Bota commented on HDFS-13031:
---

Thanks [~adam.antal] for working on this and for creating the new issue. 
This issue can be closed because we found that using an OIV improvement for 
this is a better solution than using a full-fledged NN loading the full fsimage.

> To detect fsimage corruption on the spot
> 
>
> Key: HDFS-13031
> URL: https://issues.apache.org/jira/browse/HDFS-13031
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
> Environment:  
>Reporter: Yongjun Zhang
>Assignee: Adam Antal
>Priority: Major
>
> Since we fixed HDFS-9406, there are new cases reported from the field that 
> similar fsimage corruption happens. We need good fsimage + editlogs to replay 
> to reproduce the corruption. However, usually when the corruption is detected 
> (at later NN restart), the good fsimage is already deleted.
> We need to have a way to detect fsimage corruption on the spot. Currently 
> what I think we could do is:
>  # after SNN creates a new fsimage, it spawn a new modified NN process (NN 
> with some new command line args) to just load the fsimage and do nothing 
> else. 
>  # If the process failed, the currently running SNN will do either a) backup 
> the fsimage + editlogs or b) no longer do checkpointing. And it need to 
> somehow raise a flag to user that the fsimage is corrupt.
> In step 2, if we do a, we need to introduce new NN->JN API to backup 
> editlogs; if we do b, it changes SNN's behavior, and kind of not compatible. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13824) Number of Dead nodes is not showing in the Overview and Subclusters pages. However Live nodes are relecting properly

2018-08-14 Thread Soumyapn (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyapn updated HDFS-13824:

Attachment: (was: image-2018-08-14-11-47-05-025.png)

> Number of Dead nodes is not showing in the Overview and Subclusters pages. 
> However Live nodes are relecting properly
> 
>
> Key: HDFS-13824
> URL: https://issues.apache.org/jira/browse/HDFS-13824
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.1.0
>Reporter: Soumyapn
>Priority: Major
>  Labels: RBF
>
> Scenario:
> Suppose we have 2 nameservices with 3 Datanodes each. 
> If we make 2 DN's down, then the Datanodes page, Live nodes field in Overview 
> and Live in Subclusters page is reflected to 4.
> But the Deadnodes field in Overview and Subclusters page is showing as 0. It 
> is not reflected.
> !image-2018-08-14-11-47-05-025.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13774) EC: "hdfs ec -getPolicy" is not retrieving policy details when the special REPLICATION policy set on the directory

2018-08-14 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579586#comment-16579586
 ] 

Ayush Saxena commented on HDFS-13774:
-

Thanks [~SouryakantaDwivedy] for putting up the issue.

The reason for not returning the name of policy is that "Replication" is not an 
EC policy. It just acts like a flag to make the directory take the default 
Replication policy instead of an EC policy defined on its parent.

Even if you use the ec -listPolicies command you won't see replication there.

> EC: "hdfs ec -getPolicy" is not retrieving policy details when the special 
> REPLICATION policy set on the directory
> --
>
> Key: HDFS-13774
> URL: https://issues.apache.org/jira/browse/HDFS-13774
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
> Environment: 3 Node Linux Cluster
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Minor
> Attachments: GetPolicy_EC.png
>
>
>  Erasure coding: "hdfs ec -getPolicy"" is not retrieving policy details when 
> the special REPLICATION policy set on the directory
> Steps :-
>  - Create a directory "testEC"
> - Get the EC policy for the directory [Received message as : "The erasure 
> coding policy of /testEC is unspecified" ]
> - Enable any Erasure coding policy like "XOR-2-1-1024k"
> - Set the EC Policy on the Directory
> - Get the EC policy for the directory [Received message as : "XOR-2-1-1024k" ]
> - Now again set the EC Policy on the directory as "replicate" special 
> REPLICATION policy
> - Get the EC policy for the directory [Received message as : "The erasure 
> coding policy of /testEC is unspecified" ]
>  The policy is being set for the Directory ,but while retrieving policy 
> details its throwing error as 
>  policy for the directory is unspecified which is wrong behavior



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set

2018-08-14 Thread Soumyapn (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579584#comment-16579584
 ] 

Soumyapn commented on HDFS-13732:
-

Hi Zsolt

Whenever we set default EC policy on a particuler HDFS folder, the console 
message is not giving the EC policy name.

Expected output:
Default EC policy name should be printed on the console

> Erasure Coding policy name is not coming when the new policy is set
> ---
>
> Key: HDFS-13732
> URL: https://issues.apache.org/jira/browse/HDFS-13732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Trivial
> Attachments: EC_Policy.PNG
>
>
> Scenerio:
> If the new policy apart from the default EC policy is set for the HDFS 
> directory, then the console message is coming as "Set default erasure coding 
> policy on "
> Expected output:
> It would be good If the EC policy name is displayed when the policy is set...
>  
> Actual output:
> Set default erasure coding policy on 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >