from:"Xiaoyu Yao \(Jira\)"

[jira] [Commented] (HDFS-9501) TestBalancer#testBalancerWithPinnedBlocks fails in branch-2.7

2015-12-03 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037997#comment-15037997
 ] 

Xiaoyu Yao commented on HDFS-9501:
--

Thanks [~brahmareddy] for working on this. Patch LGTM and I've tested it 
locally with branch-2.7.
+1

> TestBalancer#testBalancerWithPinnedBlocks fails in branch-2.7
> -
>
> Key: HDFS-9501
> URL: https://issues.apache.org/jira/browse/HDFS-9501
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9501-branch-2.7.patch
>
>
>  As [~xyao] pointed in  HDFS-9083..which is failing after HDFS-9083.
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; 
> support was removed in 8.0
> Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.888 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>  Time elapsed: 12.748 sec <<< FAILURE!
> java.lang.AssertionError: expected:<-3> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:362)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone

2015-12-03 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8831:
-
Attachment: HDFS-8831.05.patch

Thanks [~arpitagarwal] for the review. Patch v05 addresses the latest review 
comments.

> Trash Support for deletion in HDFS encryption zone
> --
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, 
> HDFS-8831.01.patch, HDFS-8831.02.patch, HDFS-8831.03.patch, 
> HDFS-8831.04.patch, HDFS-8831.05.patch
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone

2015-12-04 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8831:
-
Release Note: Trash is now supported for deletion of files within 
encryption zone after HDFS-8831. The deleted encrypted files will remain 
encrypted and be moved to .Trash subdirectory under the root of the encryption 
zone prefixed by $USER/current with checkpoint and expunge working similar to 
existing Trash.

> Trash Support for deletion in HDFS encryption zone
> --
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Fix For: 2.8.0
>
> Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, 
> HDFS-8831.01.patch, HDFS-8831.02.patch, HDFS-8831.03.patch, 
> HDFS-8831.04.patch, HDFS-8831.05.patch
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> This JIRA is proposed to support trash for deletion of files within 
> encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone

2015-12-02 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8831:
-
Attachment: HDFS-8831.03.patch

Update patch v03 based on [~arpit99]'s feedback. Please review, thanks!

bq. DistributedFileSystem.java:2326: We can skip the call to dfs.getEZForPath 
if isHDFSEncryptionEnabled is false to avoid extra RPC call when TDE is not 
enabled.
Good point. Fixed. 

bq. FileSystem.java:2701: Can we define .Trash as a constant somewhere?

Add FileSystem#TRASH_PREFIX for ".Trash"

bq.Trash.java:98: Avoid extra RPC for log statement. Can we cache the 
currentTrashDir some time earlier?

Every path to be deleted may have different currentTrashDir. Move the INFO log 
to TrashPolicyDefault.java to avoid extra RPC for log.

bq. TrashPolicy.java:48: I don't think we should mark it as deprecated. While 
the TrashPolicyDefault no longer uses the home parameter other implementations 
may be passing a different value here in theory.
TrashPolicy.java:57: Also we should have a default implementation of this 
routine else it will be a backward incompatible change (will break existing 
implementations of this public interface).
TrashPolicy.java:83: Need default implementation. It can just throw 
UnsupportedOperationException which should be handled by the caller.
TrashPolicy.java:92: Need default implementation. It can just throw 
UnsupportedOperationException which should be handled by the caller.

Agree and fixed. 

bq. TrashPolicy.java:108: We should leave the old method in place to keep the 
public interface backwards compatible. Perhaps to be conservative we should 
respect the 'home' parameter if one is passed in instead of using 
Filesystem#getTrashRoot?

Agree and fixed. 


> Trash Support for deletion in HDFS encryption zone
> --
>
> Key: HDFS-8831
> URL: https://issues.apache.org/jira/browse/HDFS-8831
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, 
> HDFS-8831.01.patch, HDFS-8831.02.patch, HDFS-8831.03.patch
>
>
> Currently, "Soft Delete" is only supported if the whole encryption zone is 
> deleted. If you delete files whinin the zone with trash feature enabled, you 
> will get error similar to the following 
> {code}
> rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: 
> /z1_1/startnn.sh can't be moved from an encryption zone.
> {code}
> With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of 
> the file being deleted appropriately to the same encryption zone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9528) Cleanup namenode audit/log/exception messages

2015-12-10 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051723#comment-15051723
 ] 

Xiaoyu Yao commented on HDFS-9528:
--

+1 for h9528_20151210.patch

> Cleanup namenode audit/log/exception messages
> -
>
> Key: HDFS-9528
> URL: https://issues.apache.org/jira/browse/HDFS-9528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Minor
> Attachments: h9528_20151208.patch, h9528_20151210.patch
>
>
> - Cleanup unnecessary long methods for constructing message strings.
> - Avoid calling toString() methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8326) Documentation about when checkpoints are run is out of date

2015-12-11 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052978#comment-15052978
 ] 

Xiaoyu Yao commented on HDFS-8326:
--

Good catch, [~iwasakims]. I will cherry-pick the fix to branch-2. 

> Documentation about when checkpoints are run is out of date
> ---
>
> Key: HDFS-8326
> URL: https://issues.apache.org/jira/browse/HDFS-8326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.3.0
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 2.8.0
>
> Attachments: HDFS-8326.001.patch, HDFS-8326.002.patch, 
> HDFS-8326.003.patch, HDFS-8326.004.patch, HDFS-8326.patch
>
>
> Apparently checkpointing by interval or transaction size are both supported 
> in at least HDFS 2.3, but the documentation does not reflect this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8785) TestDistributedFileSystem is failing in trunk

2015-12-14 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056249#comment-15056249
 ] 

Xiaoyu Yao commented on HDFS-8785:
--

[~yzhangal], Thanks for committing this to branch-2/branch-2.8!

> TestDistributedFileSystem is failing in trunk
> -
>
> Key: HDFS-8785
> URL: https://issues.apache.org/jira/browse/HDFS-8785
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Xiaoyu Yao
> Fix For: 2.8.0
>
> Attachments: HDFS-8785.00.patch, HDFS-8785.01.patch, 
> HDFS-8785.02.patch
>
>
> A newly added test case 
> {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in 
> trunk.
> e.g. run
> https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1

2015-12-09 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049010#comment-15049010
 ] 

Xiaoyu Yao commented on HDFS-9530:
--

This looks like the symptom of HDFS-8072, where RBW reserved space is not 
released when Datanode BlockReceiver encounters an IOException. The space won't 
be releases until DN restart. 

The fix should be included in hadoop 2.6.2 and 2.7.1. Can you post the "hadoop 
version" command output?


> huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
> -
>
> Key: HDFS-9530
> URL: https://issues.apache.org/jira/browse/HDFS-9530
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fei Hui
>
> i run a hive job, and errors are as follow
> ===
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"k":"1","v":1}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"k":"1","v":1}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.17_3
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 3 datanode(s) running and no node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3245)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:663)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:787)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
> ... 9 more
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.17_3
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 3 datanode(s) running and no node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562)
> at 
>

[jira] [Commented] (HDFS-9625) set replication for empty file failed when set storage policy

2016-01-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087946#comment-15087946
 ] 

Xiaoyu Yao commented on HDFS-9625:
--

[~Deng FEI], thanks for reporting the issue and attach the fix. 
Can you rebase the patch against the latest trunk as it won't apply?

> set replication for empty file  failed when set storage policy
> --
>
> Key: HDFS-9625
> URL: https://issues.apache.org/jira/browse/HDFS-9625
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: DENG FEI
>Assignee: DENG FEI
> Attachments: patch_HDFS-9625.20160107
>
>
>  When setReplication, the FSDirectory#updateCount need calculate the 
> related storageTypes quota,but will check the file consume the ds quota is 
> positive.
>  Actually,it's may set replication after create file,like  
> JobSplitWriter#createSplitFiles.
> It's also can reproduce on command shell:
> 1.  hdfs storagepolicies -setStoragePolicy -path /tmp -policy HOT
> 2.  hdfs dfs -touchz /tmp/test
> 3.  hdfs dfs -setrep 5 /tmp/test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2015-12-21 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066913#comment-15066913
 ] 

Xiaoyu Yao commented on HDFS-9584:
--

Thanks [~surendrasingh] for working on this. Patch LGTM. 
+1 pending Jenkins.

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8855) Webhdfs client leaks active NameNode connections

2015-11-24 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8855:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~xiaobingo]  and [~cnauroth] for the contribution and all for the 
reviews. I've committed the patch to trunk and branch-2.

> Webhdfs client leaks active NameNode connections
> 
>
> Key: HDFS-8855
> URL: https://issues.apache.org/jira/browse/HDFS-8855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Bob Hansen
>Assignee: Xiaobing Zhou
> Fix For: 2.8.0
>
> Attachments: HDFS-8855.005.patch, HDFS-8855.006.patch, 
> HDFS-8855.007.patch, HDFS-8855.008.patch, HDFS-8855.009.patch, 
> HDFS-8855.1.patch, HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, 
> HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8512) storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS

2015-11-24 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8512:
-
Attachment: HDFS-8512.01.patch

Thanks [~szetszwo] for the review. Rebase the patch to trunk.

> storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> --
>
> Key: HDFS-8512
> URL: https://issues.apache.org/jira/browse/HDFS-8512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Sumana Sathish
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch
>
>
> Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> {code}
> $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}}
>  $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9210) Fix some misuse of %n in VolumeScanner#printStats

2015-11-25 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027437#comment-15027437
 ] 

Xiaoyu Yao commented on HDFS-9210:
--

[~andrew.wang], [~templedf], can you help reviewing the patch v02 that fixes 
the issue [~templedf] pointed out?

> Fix some misuse of %n in VolumeScanner#printStats
> -
>
> Key: HDFS-9210
> URL: https://issues.apache.org/jira/browse/HDFS-9210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9210.00.patch, HDFS-9210.01.patch
>
>
> Found 2 extra "%n" in the VolumeScanner report and lines not well formatted  
> below. This JIRA is opened to fix the format issue.
> {code}
> Block scanner information for volume DS-93fb2503-de00-4f98-a8bc-c2bc13b8f0f7 
> with base path /hadoop/hdfs/data%nBytes verified in last hour   : 
> 136882014
> Blocks scanned in current period  :   
>   5
> Blocks scanned since restart  :   
>   5
> Block pool scans since restart:   
>   0
> Block scan errors since restart   :   
>   0
> Hours until next block pool scan  :   
> 476.000
> Last block scanned: 
> BP-1792969149-192.168.70.101-1444150984999:blk_1073742088_1274
> More blocks to scan in period :   
>   false
> %n
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8485) Transparent Encryption Fails to work with Yarn/MapReduce

2015-11-25 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027466#comment-15027466
 ] 

Xiaoyu Yao commented on HDFS-8485:
--

[~PrasadAlle], Do you have *dfs.encryption.key.provider.uri* configured in your 
hdfs-site.xml? It should be something like:

{code}
On hdfs-site.xml:
Key:       dfs.encryption.key.provider.uri
Value:    kms://http@myhost.mydomain:16000/kms
{code}

> Transparent Encryption Fails to work with Yarn/MapReduce
> 
>
> Key: HDFS-8485
> URL: https://issues.apache.org/jira/browse/HDFS-8485
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: RHEL-7, Kerberos 5
>Reporter: Ambud Sharma
>Priority: Critical
> Attachments: core-site.xml, hdfs-site.xml, kms-site.xml, 
> mapred-site.xml, yarn-site.xml
>
>
> Running a simple MapReduce job that writes to a path configured as an 
> encryption zone throws exception
> 11:26:26,343 INFO  [org.apache.hadoop.mapreduce.Job] (pool-14-thread-1) Task 
> Id : attempt_1432740034176_0001_m_00_2, Status : FAILED
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1) Error: java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:424)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:710)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1358)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1457)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1442)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:400)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> com.s3.ingestion.S3ImportMR$S3ImportMapper.map(S3ImportMR.java:112)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> com.s3.ingestion.S3ImportMR$S3ImportMapper.map(S3ImportMR.java:43)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> java.security.AccessController.doPrivileged(Native Method)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> javax.security.auth.Subject.doAs(Subject.java:422)
> 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 11:26:26,348 ERROR [stderr] (pool-14-thread-1)at 
> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 11:26:26,348 ERROR [stderr] (pool-14-thread-1) Caused by: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
>

[jira] [Updated] (HDFS-8512) storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS

2015-11-25 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8512:
-
Issue Type: Improvement  (was: Bug)

> storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> --
>
> Key: HDFS-8512
> URL: https://issues.apache.org/jira/browse/HDFS-8512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Sumana Sathish
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch
>
>
> Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> {code}
> $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}}
>  $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8512) WebHDFS : GETFILESTATUS should include storage type in LocatedBlock

2015-11-25 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8512:
-
Summary: WebHDFS : GETFILESTATUS should include storage type in 
LocatedBlock  (was: storage type inside LocatedBlock object is not fully 
exposed for GETFILESTATUS)

> WebHDFS : GETFILESTATUS should include storage type in LocatedBlock
> ---
>
> Key: HDFS-8512
> URL: https://issues.apache.org/jira/browse/HDFS-8512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Sumana Sathish
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch
>
>
> Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> {code}
> $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}}
>  $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8512) WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info

2015-11-25 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8512:
-
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks [~szetszwo] for the review! I've committed the patch to trunk and 
branch-2. 

> WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info
> -
>
> Key: HDFS-8512
> URL: https://issues.apache.org/jira/browse/HDFS-8512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Sumana Sathish
>Assignee: Xiaoyu Yao
> Fix For: 2.8.0
>
> Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch
>
>
> Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> {code}
> $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}}
>  $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9210) Fix some misuse of %n in VolumeScanner#printStats

2015-11-25 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9210:
-
Attachment: HDFS-9210.02.patch

Thanks [~templedf] for the review! 
Attached a patch using System.LineSeparator().

> Fix some misuse of %n in VolumeScanner#printStats
> -
>
> Key: HDFS-9210
> URL: https://issues.apache.org/jira/browse/HDFS-9210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9210.00.patch, HDFS-9210.01.patch, 
> HDFS-9210.02.patch
>
>
> Found 2 extra "%n" in the VolumeScanner report and lines not well formatted  
> below. This JIRA is opened to fix the format issue.
> {code}
> Block scanner information for volume DS-93fb2503-de00-4f98-a8bc-c2bc13b8f0f7 
> with base path /hadoop/hdfs/data%nBytes verified in last hour   : 
> 136882014
> Blocks scanned in current period  :   
>   5
> Blocks scanned since restart  :   
>   5
> Block pool scans since restart:   
>   0
> Block scan errors since restart   :   
>   0
> Hours until next block pool scan  :   
> 476.000
> Last block scanned: 
> BP-1792969149-192.168.70.101-1444150984999:blk_1073742088_1274
> More blocks to scan in period :   
>   false
> %n
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8512) WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info

2015-11-25 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-8512:
-
Summary: WebHDFS : GETFILESTATUS should return LocatedBlock with storage 
type info  (was: WebHDFS : GETFILESTATUS should include storage type in 
LocatedBlock)

> WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info
> -
>
> Key: HDFS-8512
> URL: https://issues.apache.org/jira/browse/HDFS-8512
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Reporter: Sumana Sathish
>Assignee: Xiaoyu Yao
> Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch
>
>
> Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
> {code}
> $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:13 GMT
> Date: Wed, 27 May 2015 18:04:13 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}}
>  $ curl -i 
> "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178;
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Expires: Wed, 27 May 2015 18:04:55 GMT
> Date: Wed, 27 May 2015 18:04:55 GMT
> Pragma: no-cache
> Content-Type: application/json
> Set-Cookie: 
> hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA=";
>  Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly
> Transfer-Encoding: chunked
> Server: Jetty(6.1.26)
>  
> {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections

2015-11-23 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023292#comment-15023292
 ] 

Xiaoyu Yao commented on HDFS-8855:
--

Thanks [~xiaobingo] for updating the patch. 
+1 for the latest patch. I will commit it shortly.

> Webhdfs client leaks active NameNode connections
> 
>
> Key: HDFS-8855
> URL: https://issues.apache.org/jira/browse/HDFS-8855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Bob Hansen
>Assignee: Xiaobing Zhou
> Fix For: 2.8.0
>
> Attachments: HDFS-8855.005.patch, HDFS-8855.006.patch, 
> HDFS-8855.007.patch, HDFS-8855.008.patch, HDFS-8855.009.patch, 
> HDFS-8855.1.patch, HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, 
> HDFS_8855.prototype.patch
>
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2016-01-11 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9584:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks [~surendrasingh] for the contribution and all for the reviews. I've 
commit the change to trunk, branch-2 and branch-2.8. 

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9584.001.patch, HDFS-9584.patch, HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9244) Support nested encryption zones

2016-01-11 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093013#comment-15093013
 ] 

Xiaoyu Yao commented on HDFS-9244:
--

Thanks [~zhz] for working on this. Can we clarify the use cases (in addition to 
the original one mentioned in the description) before unblocking this? And how 
often are they being used/requested by the customer deployments.

My concern is that this could bring up tricky cases such as upgrade/rollback, 
trash, etc. to document, support and maintain for nested zones. We don't want 
to introduce unnecessary complexity unless there are important use cases behind 
it. Thanks!

> Support nested encryption zones
> ---
>
> Key: HDFS-9244
> URL: https://issues.apache.org/jira/browse/HDFS-9244
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: encryption
>Reporter: Xiaoyu Yao
>Assignee: Zhe Zhang
> Attachments: HDFS-9244.00.patch, HDFS-9244.01.patch
>
>
> This JIRA is opened to track adding support of nested encryption zone based 
> on [~andrew.wang]'s [comment 
> |https://issues.apache.org/jira/browse/HDFS-8747?focusedCommentId=14654141=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14654141]
>  for certain use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.

2016-01-11 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093202#comment-15093202
 ] 

Xiaoyu Yao commented on HDFS-9584:
--

Thanks [~jojochuang]! I've corrected the commit message. 

> NPE in distcp when ssl configuration file does not exist in class path.
> ---
>
> Key: HDFS-9584
> URL: https://issues.apache.org/jira/browse/HDFS-9584
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9584.001.patch, HDFS-9584.patch, HDFS-9584.patch
>
>
> {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml 
> hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat}
> if {{ssl-distcp.xml}} file not exist in class path, distcp will throw 
> NullPointerException.
> {code}
> java.lang.NullPointerException
> at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266)
> at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250)
> at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175)
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:127)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:431)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8548) Minicluster throws NPE on shutdown

2016-06-03 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314688#comment-15314688
 ] 

Xiaoyu Yao commented on HDFS-8548:
--

Sounds good to me. I just cherry-picked it to branch-2.7.

> Minicluster throws NPE on shutdown
> --
>
> Key: HDFS-8548
> URL: https://issues.apache.org/jira/browse/HDFS-8548
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mike Drob
>Assignee: Surendra Singh Lilhore
>  Labels: reviewed
> Fix For: 2.8.0
>
> Attachments: HDFS-8548.patch
>
>
> FtAfter running Solr tests, when we attempt to shut down the mini cluster 
> that we use for our unit tests, we get an NPE in the clean up thread. The 
> test still completes normally, but this generates a lot of extra noise.
> {noformat}
>[junit4]   2> java.lang.reflect.InvocationTargetException
>[junit4]   2>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>[junit4]   2>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]   2>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>[junit4]   2>  at java.lang.reflect.Method.invoke(Method.java:497)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
>[junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804)
>[junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595)
>[junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813)
>[junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430)
>[junit4]   2>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>[junit4]   2>  at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:227)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:212)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:461)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:212)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
>[junit4]   2>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
>[junit4]   2>  at 
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:145)
>[junit4]   2>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:822)
>[junit4]   2>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1720)
>[junit4]   2>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1699)
>[junit4]   2>  at 
> org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:197)
>[junit4]   2>  at 
> org.apache.solr.core.HdfsDirectoryFactoryTest.teardownClass(HdfsDirectoryFactoryTest.java:67)
>[junit4]   2>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>[junit4]   2>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>[junit4]   2>  at 
>

[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-10 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325503#comment-15325503
 ] 

Xiaoyu Yao commented on HDFS-10512:
---

Thanks [~jojochuang] for reporting the issue and [~linyiqun] for posting the 
patch. 
There is a similar usage {{DataNode#reportBadBlock}} that needs to check null 
volume as well. 
For both cases, I would suggest we LOG an ERROR like follows.

{code}
if (volume != null) {
  bpos.reportBadBlocks(
  block, volume.getStorageID(), volume.getStorageType());
} else {
  LOG.error("Cannot find FsVolumeSpi to report bad block id:"
  + blockBlockId()  + " bpid: " + block.getBlockPoolId());
}
{code}

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-9650) Problem is logging of "Redundant addStoredBlock request received"

2016-06-14 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-9650:


Assignee: Xiaoyu Yao

> Problem is logging of "Redundant addStoredBlock request received"
> -
>
> Key: HDFS-9650
> URL: https://issues.apache.org/jira/browse/HDFS-9650
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Frode Halvorsen
>Assignee: Xiaoyu Yao
>
> Description;
> Hadoop 2.7.1. 2 namenodes in HA. 14 Datanodes.
> Enough CPU,disk and RAM.
> Just discovered that some datanodes must have been corrupted somehow.
> When restarting  a 'defect' ( works without failure except when restarting) 
> the active namenode suddenly is logging a lot of : "Redundant addStoredBlock 
> request received"
> and finally the failover-controller takes the namenode down, fails over to 
> other node. This node also starts logging the same, and as soon as the fisrt 
> node is bac online, the failover-controller again kill the active node, and 
> does failover.
> This node now was started after the datanode, and doesn't log "Redundant 
> addStoredBlock request received" anymore, and restart of the second name-node 
> works fine.
> If I again restarts the datanode- the process repeats itself.
> Problem is logging of "Redundant addStoredBlock request received" and why 
> does it happen ? 
> The failover-controller acts the same way as it did on 2.5/6 when we had a 
> lot of 'block does not belong to any replica'-messages. Namenode is too busy 
> to respond to heartbeats, and is taken down...
> To resolve this, I have to take down the datanode, delete all data from it, 
> and start it up. Then cluster will reproduce the missing blocks, and the 
> failing datanode is working fine again...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9650) Problem is logging of "Redundant addStoredBlock request received"

2016-06-14 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331182#comment-15331182
 ] 

Xiaoyu Yao commented on HDFS-9650:
--

Thanks [~brahma] for the heads up. Yes, we do need to backport HDFS-9906 to 
branch-2.7. 
In our case, adding a dedicated serviceRPC port help avoiding the NN failover.  

> Problem is logging of "Redundant addStoredBlock request received"
> -
>
> Key: HDFS-9650
> URL: https://issues.apache.org/jira/browse/HDFS-9650
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Frode Halvorsen
>Assignee: Xiaoyu Yao
>
> Description;
> Hadoop 2.7.1. 2 namenodes in HA. 14 Datanodes.
> Enough CPU,disk and RAM.
> Just discovered that some datanodes must have been corrupted somehow.
> When restarting  a 'defect' ( works without failure except when restarting) 
> the active namenode suddenly is logging a lot of : "Redundant addStoredBlock 
> request received"
> and finally the failover-controller takes the namenode down, fails over to 
> other node. This node also starts logging the same, and as soon as the fisrt 
> node is bac online, the failover-controller again kill the active node, and 
> does failover.
> This node now was started after the datanode, and doesn't log "Redundant 
> addStoredBlock request received" anymore, and restart of the second name-node 
> works fine.
> If I again restarts the datanode- the process repeats itself.
> Problem is logging of "Redundant addStoredBlock request received" and why 
> does it happen ? 
> The failover-controller acts the same way as it did on 2.5/6 when we had a 
> lot of 'block does not belong to any replica'-messages. Namenode is too busy 
> to respond to heartbeats, and is taken down...
> To resolve this, I have to take down the datanode, delete all data from it, 
> and start it up. Then cluster will reproduce the missing blocks, and the 
> failing datanode is working fine again...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-06-15 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331228#comment-15331228
 ] 

Xiaoyu Yao commented on HDFS-9906:
--

Cherrypicked to branch-2.7.

> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing

2016-06-15 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10528:
--
Status: Patch Available  (was: Open)

> Add logging to successful standby checkpointing
> ---
>
> Key: HDFS-10528
> URL: https://issues.apache.org/jira/browse/HDFS-10528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10528.00.patch
>
>
> This ticket is opened to add INFO log for a successful standby checkpointing 
> in the code below for troubleshooting.
> {code}
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a 
> ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
>   namesystem.setCreatedRollbackImages(true);
>   namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
>   }
> } catch (SaveNamespaceCancelledException ce) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing

2016-06-15 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10528:
--
Attachment: HDFS-10528.00.patch

> Add logging to successful standby checkpointing
> ---
>
> Key: HDFS-10528
> URL: https://issues.apache.org/jira/browse/HDFS-10528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10528.00.patch
>
>
> This ticket is opened to add INFO log for a successful standby checkpointing 
> in the code below for troubleshooting.
> {code}
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a 
> ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
>   namesystem.setCreatedRollbackImages(true);
>   namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
>   }
> } catch (SaveNamespaceCancelledException ce) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-06-15 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332102#comment-15332102
 ] 

Xiaoyu Yao commented on HDFS-9906:
--

Thanks [~ajisakaa], cherry-pick to 2.7.3 and update fix version to 2.7.3. cc: 
[~vinodkv].

> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0, 2.7.4
>
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9906) Remove spammy log spew when a datanode is restarted

2016-06-15 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9906:
-
Fix Version/s: (was: 2.7.4)
   2.7.3

> Remove spammy log spew when a datanode is restarted
> ---
>
> Key: HDFS-9906
> URL: https://issues.apache.org/jira/browse/HDFS-9906
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Elliott Clark
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9906.patch
>
>
> {code}
> WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock 
> request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size 
> 268435456
> {code}
> This happens wy too much to add any useful information. We should either 
> move this to a different level or only warn once per machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-01 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310837#comment-15310837
 ] 

Xiaoyu Yao commented on HDFS-9924:
--

[~daryn], thanks for the valuable feedback. @Kihwal Lee also mentioned similar 
issue 
[here|https://issues.apache.org/jira/browse/HADOOP-12916?focusedCommentId=15277342=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15277342].
 But I wasn't able to get clarification of it. The FSN/FSD locking issue is a 
very good point. I tried to find some metrics/logs about it but there was not 
any. I will open a separate ticket to add more metrics and WARN/DEBUG logs for 
long locking operations on namenode similar to what we have for slow 
write/network WARN/metrics on datanode.  

As you mentioned above, the priority level is assigned by scheduler. As part of 
HADOOP-12916, we separate scheduler from call queue and make it pluggable so 
that priority assignment can be customized as appropriate for different 
workloads. For the mixed write intensive and read workload example, I agree 
that the DecayedRpcScheduler that uses call rate to determine priority may not 
be the good choice. We have thought of adding a different scheduler that 
combines the weight of RPC call and its rate. But it is tricky to assign 
weight. For example,  getContentSummary on a directory with millions of 
files/dirs and a directory with a few files/dirs won't have the same impact on 
NN. 

Backoff based on response time allows all users to stop overloading namenode 
when the high priority RPC calls experience longer than normal end to end 
delay. User2/User3/User4 (low priority based on call rate) will have much wider 
response time threshold for backing off. In this case, User 1 will be backed 
off first by breaking the relative smaller response time threshold and get 
namenode out of the state that other users can not use the namenode "fairly". 

We are also proposing to have a scheduler that offers better namenode resource 
management via YARN integration on HADOOP-13128. I would appreciate if you can 
share your thoughts and comments on the proposal there as well. Thanks!


> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-01 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310837#comment-15310837
 ] 

Xiaoyu Yao edited comment on HDFS-9924 at 6/1/16 6:31 PM:
--

[~daryn], thanks for the valuable feedback. [~kihwal] also mentioned similar 
issue 
[here|https://issues.apache.org/jira/browse/HADOOP-12916?focusedCommentId=15277342=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15277342].
 But I wasn't able to get clarification of it. The FSN/FSD locking issue is a 
very good point. I tried to find some metrics/logs about it but there was not 
any. I will open a separate ticket to add more metrics and WARN/DEBUG logs for 
long locking operations on namenode similar to what we have for slow 
write/network WARN/metrics on datanode.  

As you mentioned above, the priority level is assigned by scheduler. As part of 
HADOOP-12916, we separate scheduler from call queue and make it pluggable so 
that priority assignment can be customized as appropriate for different 
workloads. For the mixed write intensive and read workload example, I agree 
that the DecayedRpcScheduler that uses call rate to determine priority may not 
be the good choice. We have thought of adding a different scheduler that 
combines the weight of RPC call and its rate. But it is tricky to assign 
weight. For example,  getContentSummary on a directory with millions of 
files/dirs and a directory with a few files/dirs won't have the same impact on 
NN. 

Backoff based on response time allows all users to stop overloading namenode 
when the high priority RPC calls experience longer than normal end to end 
delay. User2/User3/User4 (low priority based on call rate) will have much wider 
response time threshold for backing off. In this case, User 1 will be backed 
off first by breaking the relative smaller response time threshold and get 
namenode out of the state that other users can not use the namenode "fairly". 

We are also proposing to have a scheduler that offers better namenode resource 
management via YARN integration on HADOOP-13128. I would appreciate if you can 
share your thoughts and comments on the proposal there as well. Thanks!



was (Author: xyao):
[~daryn], thanks for the valuable feedback. @Kihwal Lee also mentioned similar 
issue 
[here|https://issues.apache.org/jira/browse/HADOOP-12916?focusedCommentId=15277342=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15277342].
 But I wasn't able to get clarification of it. The FSN/FSD locking issue is a 
very good point. I tried to find some metrics/logs about it but there was not 
any. I will open a separate ticket to add more metrics and WARN/DEBUG logs for 
long locking operations on namenode similar to what we have for slow 
write/network WARN/metrics on datanode.  

As you mentioned above, the priority level is assigned by scheduler. As part of 
HADOOP-12916, we separate scheduler from call queue and make it pluggable so 
that priority assignment can be customized as appropriate for different 
workloads. For the mixed write intensive and read workload example, I agree 
that the DecayedRpcScheduler that uses call rate to determine priority may not 
be the good choice. We have thought of adding a different scheduler that 
combines the weight of RPC call and its rate. But it is tricky to assign 
weight. For example,  getContentSummary on a directory with millions of 
files/dirs and a directory with a few files/dirs won't have the same impact on 
NN. 

Backoff based on response time allows all users to stop overloading namenode 
when the high priority RPC calls experience longer than normal end to end 
delay. User2/User3/User4 (low priority based on call rate) will have much wider 
response time threshold for backing off. In this case, User 1 will be backed 
off first by breaking the relative smaller response time threshold and get 
namenode out of the state that other users can not use the namenode "fairly". 

We are also proposing to have a scheduler that offers better namenode resource 
management via YARN integration on HADOOP-13128. I would appreciate if you can 
share your thoughts and comments on the proposal there as well. Thanks!


> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to

[jira] [Created] (HDFS-10475) Adding metrics and warn/debug logs for long FSD lock

2016-06-01 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10475:
-

 Summary: Adding metrics and warn/debug logs for long FSD lock
 Key: HDFS-10475
 URL: https://issues.apache.org/jira/browse/HDFS-10475
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


This is a follow up of the comment on HADOOP-12916 and 
[here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837]
 add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on 
namenode similar to what we have for slow write/network WARN/metrics on 
datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10528) Add logging to successful standby checkpointing

2016-06-14 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10528:
-

 Summary: Add logging to successful standby checkpointing
 Key: HDFS-10528
 URL: https://issues.apache.org/jira/browse/HDFS-10528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


This ticket is opened to add INFO log for a successful standby checkpointing in 
the code below for troubleshooting.

{code}
if (needCheckpoint) {
doCheckpoint();
// reset needRollbackCheckpoint to false only when we finish a ckpt
// for rollback image
if (needRollbackCheckpoint
&& namesystem.getFSImage().hasRollbackFSImage()) {
  namesystem.setCreatedRollbackImages(true);
  namesystem.setNeedRollbackFsImage(false);
}
lastCheckpointTime = now;
  }
} catch (SaveNamespaceCancelledException ce) {
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10528) Add logging to successful standby checkpointing

2016-06-14 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330846#comment-15330846
 ] 

Xiaoyu Yao commented on HDFS-10528:
---

Plan to add a log entry after {{ lastCheckpointTime = now;}}.

> Add logging to successful standby checkpointing
> ---
>
> Key: HDFS-10528
> URL: https://issues.apache.org/jira/browse/HDFS-10528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> This ticket is opened to add INFO log for a successful standby checkpointing 
> in the code below for troubleshooting.
> {code}
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a 
> ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
>   namesystem.setCreatedRollbackImages(true);
>   namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
>   }
> } catch (SaveNamespaceCancelledException ce) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing

2016-06-16 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10528:
--
Status: Open  (was: Patch Available)

> Add logging to successful standby checkpointing
> ---
>
> Key: HDFS-10528
> URL: https://issues.apache.org/jira/browse/HDFS-10528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10528.00.patch
>
>
> This ticket is opened to add INFO log for a successful standby checkpointing 
> in the code below for troubleshooting.
> {code}
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a 
> ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
>   namesystem.setCreatedRollbackImages(true);
>   namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
>   }
> } catch (SaveNamespaceCancelledException ce) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing

2016-06-16 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10528:
--
Status: Patch Available  (was: Open)

> Add logging to successful standby checkpointing
> ---
>
> Key: HDFS-10528
> URL: https://issues.apache.org/jira/browse/HDFS-10528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-10528.00.patch
>
>
> This ticket is opened to add INFO log for a successful standby checkpointing 
> in the code below for troubleshooting.
> {code}
> if (needCheckpoint) {
> doCheckpoint();
> // reset needRollbackCheckpoint to false only when we finish a 
> ckpt
> // for rollback image
> if (needRollbackCheckpoint
> && namesystem.getFSImage().hasRollbackFSImage()) {
>   namesystem.setCreatedRollbackImages(true);
>   namesystem.setNeedRollbackFsImage(false);
> }
> lastCheckpointTime = now;
>   }
> } catch (SaveNamespaceCancelledException ce) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10539) DecayRpcScheduler MXBean should only report decayed CallVolumeSummary

2016-06-16 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10539:
-

 Summary: DecayRpcScheduler MXBean should only report decayed 
CallVolumeSummary
 Key: HDFS-10539
 URL: https://issues.apache.org/jira/browse/HDFS-10539
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ipc
Reporter: Namit Maheshwari
Assignee: Xiaoyu Yao


HADOOP-13197 added non-decayed call metrics in metrics2 source for 
DecayedRpcScheduler. However, CallVolumeSummary in MXBean was affected 
unexpectedly to include both decayed and non-decayed call volume. The root 
cause is Jackson ObjectMapper simply serialize all the content of the 
callCounts map which contains both non-decayed and decayed counter after 
HADOOP-13197. This ticket is opened to fix the CallVolumeSummary in MXBean to 
include only decayed call volume for backward compatibility and add unit test 
for DecayRpcScheduler MXBean to catch this in future. 

CallVolumeSummary JMX example before HADOOP-13197
{code}
"CallVolumeSummary" : "{\"hbase\":1,\"mapred\":1}"
{code}

 CallVolumeSummary JMX example after HADOOP-13197
{code}
"CallVolumeSummary" : "{\"hrt_qa\":[1,2]}"
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-10469) Add number of active xceivers to datanode metrics

2016-06-21 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342988#comment-15342988
 ] 

Xiaoyu Yao edited comment on HDFS-10469 at 6/21/16 11:02 PM:
-

Thanks [~hanishakoneru] for updating the patch. The V3 patch looks to me and 
unit test failures don't seem relate to this patch. 
+1 and I will rerun failed tests and commit it if everything pass locally. 


was (Author: xyao):
Thanks [~hanishakoneru] for updating the patch. The V4 patch looks to me and 
unit test failures don't seem relate to this patch. 
+1 and I will rerun failed tests and commit it if everything pass locally. 

> Add number of active xceivers to datanode metrics
> -
>
> Key: HDFS-10469
> URL: https://issues.apache.org/jira/browse/HDFS-10469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, 
> HDFS-10469.002.patch, HDFS-10469.003.patch
>
>
> Number of active xceivers is exposed via jmx, but not in Datanode metrics. We 
> should add it to datanode metrics for monitoring the load on Datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10469) Add number of active xceivers to datanode metrics

2016-06-21 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342988#comment-15342988
 ] 

Xiaoyu Yao commented on HDFS-10469:
---

Thanks [~hanishakoneru] for updating the patch. The V4 patch looks to me and 
unit test failures don't seem relate to this patch. 
+1 and I will rerun failed tests and commit it if everything pass locally. 

> Add number of active xceivers to datanode metrics
> -
>
> Key: HDFS-10469
> URL: https://issues.apache.org/jira/browse/HDFS-10469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, 
> HDFS-10469.002.patch, HDFS-10469.003.patch
>
>
> Number of active xceivers is exposed via jmx, but not in Datanode metrics. We 
> should add it to datanode metrics for monitoring the load on Datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10535) Rename AsyncDistributedFileSystem

2016-06-16 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334156#comment-15334156
 ] 

Xiaoyu Yao commented on HDFS-10535:
---

Thanks [~szetszwo] for working on this. The patch looks good to me and just two 
unit test issues below.

TestAsyncDFS.java 
{code}
return cluster.getFileSystem().getAsyncDistributedFileSystem(); ==>
return cluster.getFileSystem().getNonblockingCalls(); 
{code}

TestAsyncHDFSWithHA.java
{code}
dfs.getAsyncDistributedFileSystem() ==>
dfs.getNonblockingCalls()
{code}


> Rename AsyncDistributedFileSystem
> -
>
> Key: HDFS-10535
> URL: https://issues.apache.org/jira/browse/HDFS-10535
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h10535_20160616.patch
>
>
> Per discussion in HDFS-9924, AsyncDistributedFileSystem is not a good name 
> since we only support nonblocking calls for the moment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10469) Add number of active xceivers to datanode metrics

2016-06-23 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347197#comment-15347197
 ] 

Xiaoyu Yao commented on HDFS-10469:
---

I finished testing this patch against the failed tests. Only 
TestOfflineEditsViewer#testGenerated can be consistently repro no matter the 
patch for HDFS-10469 is applied or not. I opened HDFS-10572 to track the fix 
for TestOfflineEditsViewer#testGenerated and will commit HDFS-10469 shortly.

> Add number of active xceivers to datanode metrics
> -
>
> Key: HDFS-10469
> URL: https://issues.apache.org/jira/browse/HDFS-10469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
> Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, 
> HDFS-10469.002.patch, HDFS-10469.003.patch
>
>
> Number of active xceivers is exposed via jmx, but not in Datanode metrics. We 
> should add it to datanode metrics for monitoring the load on Datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10469) Add number of active xceivers to datanode metrics

2016-06-23 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10469:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~hanishakoneru] for the contribution. I've committed the patch to 
trunk. 

> Add number of active xceivers to datanode metrics
> -
>
> Key: HDFS-10469
> URL: https://issues.apache.org/jira/browse/HDFS-10469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>  Labels: datanode, metrics
> Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, 
> HDFS-10469.002.patch, HDFS-10469.003.patch
>
>
> Number of active xceivers is exposed via jmx, but not in Datanode metrics. We 
> should add it to datanode metrics for monitoring the load on Datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10469) Add number of active xceivers to datanode metrics

2016-06-23 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10469:
--
Labels: datanode metrics  (was: )

> Add number of active xceivers to datanode metrics
> -
>
> Key: HDFS-10469
> URL: https://issues.apache.org/jira/browse/HDFS-10469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>  Labels: datanode, metrics
> Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, 
> HDFS-10469.002.patch, HDFS-10469.003.patch
>
>
> Number of active xceivers is exposed via jmx, but not in Datanode metrics. We 
> should add it to datanode metrics for monitoring the load on Datanodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10572) Fix TestOfflineEditsViewer#testGenerated

2016-06-23 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10572:
-

 Summary: Fix TestOfflineEditsViewer#testGenerated
 Key: HDFS-10572
 URL: https://issues.apache.org/jira/browse/HDFS-10572
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: newbie, test
Reporter: Xiaoyu Yao


The test has been failing consistently on trunk recently. This ticket is open 
to fix this test to avoid false alarm on Jenkins. Figure out which recent 
commit caused this failure can be a good start. 
 
{code}
---
 T E S T S
---
Running org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.646 sec <<< 
FAILURE! - in 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
testGenerated(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer)
  Time elapsed: 3.623 sec  <<< FAILURE!
java.lang.AssertionError: Generated edits and reparsed (bin to XML to bin) 
should be same
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer.testGenerated(TestOfflineEditsViewer.java:125)


Results :

Failed tests: 
  TestOfflineEditsViewer.testGenerated:125 Generated edits and reparsed (bin to 
XML to bin) should be same

Tests run: 5, Failures: 1, Errors: 0, Skipped: 0

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9759) Fix the typo in JvmPauseMonitor#getNumGcWarnThreadholdExceeded

2016-02-05 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134633#comment-15134633
 ] 

Xiaoyu Yao commented on HDFS-9759:
--

+1, I will commit it shortly.

> Fix the typo in JvmPauseMonitor#getNumGcWarnThreadholdExceeded
> --
>
> Key: HDFS-9759
> URL: https://issues.apache.org/jira/browse/HDFS-9759
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
>Priority: Minor
> Attachments: HDFS-9759.000.patch
>
>
> There is typo in JvmPauseMonitor#getNumGcWarnThreadholdExceeded, which should 
> be Threshold.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8660) Slow write to packet mirror should log which mirror and which block

2016-02-04 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133369#comment-15133369
 ] 

Xiaoyu Yao commented on HDFS-8660:
--

[~hazem], thanks for working on this. Can you rebase the patch to trunk?

> Slow write to packet mirror should log which mirror and which block
> ---
>
> Key: HDFS-8660
> URL: https://issues.apache.org/jira/browse/HDFS-8660
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hazem Mahmoud
>Assignee: Hazem Mahmoud
> Attachments: HDFS-8660.001.patch
>
>
> Currently, log format states something similar to: 
> "Slow BlockReceiver write packet to mirror took 468ms (threshold=300ms)"
> For troubleshooting purposes, it would be good to have it mention which block 
> ID it's writing as well as the mirror (DN) that it's writing it to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9688) Test the effect of nested encryption zones in HDFS downgrade

2016-01-29 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123865#comment-15123865
 ] 

Xiaoyu Yao commented on HDFS-9688:
--

bq. Also, renaming the root dir of a nested EZ won't be allowed, because the 
destination will be in an EZ (the parent EZ). I think this is the right 
behavior for nested EZ, but please see if you agree.

[~zhz], Do you plan to block rename EZ root on 2.7 and forward? I don't think 
we should block rename EZ root, which is an incompatible change from 2.7 that 
can break Trash support for HDFS encryption. If I understand the nested EZ 
correctly, the renamed nested EZ will still be encrypted with its own zone key 
no matter the destination is encrypted with different keys or not. 

> Test the effect of nested encryption zones in HDFS downgrade
> 
>
> Key: HDFS-9688
> URL: https://issues.apache.org/jira/browse/HDFS-9688
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: encryption, test
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9688-branch-2.6.00.patch, 
> HDFS-9688-branch-2.6.01.patch, HDFS-9688-branch-2.8.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9210) Fix some misuse of %n in VolumeScanner#printStats

2016-02-01 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9210:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks all for the reviews. I've commit the patch to trunk, branch-2 and 
branch-2.8.

> Fix some misuse of %n in VolumeScanner#printStats
> -
>
> Key: HDFS-9210
> URL: https://issues.apache.org/jira/browse/HDFS-9210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9210.00.patch, HDFS-9210.01.patch, 
> HDFS-9210.02.patch
>
>
> Found 2 extra "%n" in the VolumeScanner report and lines not well formatted  
> below. This JIRA is opened to fix the format issue.
> {code}
> Block scanner information for volume DS-93fb2503-de00-4f98-a8bc-c2bc13b8f0f7 
> with base path /hadoop/hdfs/data%nBytes verified in last hour   : 
> 136882014
> Blocks scanned in current period  :   
>   5
> Blocks scanned since restart  :   
>   5
> Block pool scans since restart:   
>   0
> Block scan errors since restart   :   
>   0
> Hours until next block pool scan  :   
> 476.000
> Last block scanned: 
> BP-1792969149-192.168.70.101-1444150984999:blk_1073742088_1274
> More blocks to scan in period :   
>   false
> %n
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9750) Document -source option for balancer

2016-02-03 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao resolved HDFS-9750.
--
Resolution: Duplicate

> Document -source option for balancer
> 
>
> Key: HDFS-9750
> URL: https://issues.apache.org/jira/browse/HDFS-9750
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> HDFS-8826 introduced -source option for balancer. This needs to be documented 
> in HDFSCommands.md for administrators to use it appropriately. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8923) Add -source flag to balancer usage message

2016-02-03 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130969#comment-15130969
 ] 

Xiaoyu Yao commented on HDFS-8923:
--

[~ctrezzo], the patch v2 looks good to me. Can you rebase the patch to trunk?

> Add -source flag to balancer usage message
> --
>
> Key: HDFS-8923
> URL: https://issues.apache.org/jira/browse/HDFS-8923
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Trivial
> Attachments: HDFS-8923-trunk-v1.patch, HDFS-8923-trunk-v2.patch
>
>
> HDFS-8826 added a -source flag to the balancer, but the usage message still 
> needs to be updated. See current usage message in trunk:
> {code}
>private static final String USAGE = "Usage: hdfs balancer"
>+ "\n\t[-policy ]\tthe balancing policy: "
>+ BalancingPolicy.Node.INSTANCE.getName() + " or "
>+ BalancingPolicy.Pool.INSTANCE.getName()
>+ "\n\t[-threshold ]\tPercentage of disk capacity"
>+ "\n\t[-exclude [-f  | ]]"
>+ "\tExcludes the specified datanodes."
>+ "\n\t[-include [-f  | ]]"
>+ "\tIncludes only the specified datanodes."
>+ "\n\t[-idleiterations ]"
>+ "\tNumber of consecutive idle iterations (-1 for Infinite) before "
>+ "exit."
>+ "\n\t[-runDuringUpgrade]"
>+ "\tWhether to run the balancer during an ongoing HDFS upgrade."
>+ "This is usually not desired since it will not affect used space "
>+ "on over-utilized machines.";
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9750) Document -source option for balancer

2016-02-03 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-9750:


 Summary: Document -source option for balancer
 Key: HDFS-9750
 URL: https://issues.apache.org/jira/browse/HDFS-9750
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer & mover
Affects Versions: 2.8.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


HDFS-8826 introduced -source option for balancer. This needs to be documented 
in HDFSCommands.md for administrators to use it appropriately. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9723) Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext

2016-01-29 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-9723:


 Summary: Improve Namenode Throttling Against Bad Jobs with FCQ and 
CallerContext
 Key: HDFS-9723
 URL: https://issues.apache.org/jira/browse/HDFS-9723
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


HDFS namenode handles RPC requests from DFS clients and internal processing 
from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 
is the one of the existing efforts added since Hadoop 2.4 to address this 
issue. 

In current FCQ implementation, incoming RPC calls are scheduled based on the 
number of recent RPC calls (1000) of different users with a time-decayed 
scheduler. This works well when there is a clear mapping between users and 
their RPC calls from different jobs. However, this may not work effectively 
when it is hard to track calls to a specific caller in a chain of operations 
from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for 
operators/administrators to throttle all the hive jobs because of one “bad” 
query.

This JIRA proposed to leverage RPC caller context information (such as 
callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative 
to existing UGI (or user name when delegation token is not available) based 
Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue 
(HADOOP-9640) for better namenode throttling in multi-tenancy cluster 
deployment.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-23 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9843:
-
Attachment: HDFS-9843.02.patch

Thanks [~cnauroth]! Update the patch with the fixed the link. 

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9843.00.patch, HDFS-9843.01.patch, 
> HDFS-9843.02.patch
>
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-23 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9843:
-
Attachment: HDFS-9843.01.patch

Thanks [~cnauroth] for the review. Patch 01 fix the anchor to distcp command 
line options.

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9843.00.patch, HDFS-9843.01.patch
>
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9843:
-
Affects Version/s: 2.6.0

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9843:
-
Component/s: encryption
 documentation
 distcp

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-22 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-9843:


 Summary: Document distcp options required for copying between 
encrypted locations
 Key: HDFS-9843
 URL: https://issues.apache.org/jira/browse/HDFS-9843
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


In TransparentEncryption.md#Distcp_considerations document section, we have 
"Copying_between_encrypted_and_unencrypted_locations" which requires 
-skipcrccheck and -update. 

These options should be documented as required for "Copying between encrypted 
locations" use cases as well because this involves decrypting source file and 
encrypting destination file with a different EDEK, resulting in different 
checksum at the destination. Distcp will fail at crc check if -skipcrccheck if 
not specified.

This ticket is opened to document the required options for "Copying between 
encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122

2016-02-26 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9831:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
 Tags: webhdfs
   Status: Resolved  (was: Patch Available)

> Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 
> 
>
> Key: HDFS-9831
> URL: https://issues.apache.org/jira/browse/HDFS-9831
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Fix For: 2.8.0
>
> Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch, 
> HDFS-9831.002.patch, HDFS-9831.003.patch
>
>
> This ticket is opened to document the configuration keys introduced by 
> HDFS-5219/HDFS-5122 for WebHdfs Retry.  Both hdfs-default.xml and webhdfs.md 
> should be updated with the usage of these keys.
> {code}
> / WebHDFS retry policy
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = 
> "dfs.http.client.retry.policy.enabled";
>   public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = 
> false;
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = 
> "dfs.http.client.retry.policy.spec";
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = 
> "1,6,6,10"; //t1,n1,t2,n2,...
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.failover.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 
> 15;
>   public static final String  DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.retry.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = 
> "dfs.http.client.failover.sleep.base.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT 
> = 500;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = 
> "dfs.http.client.failover.sleep.max.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 
> 15000;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122

2016-02-26 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9831:
-

Thanks [~xiaobingo] for the contribution. I've committed the patch to trunk, 
branch-2, and branch-2.8.

> Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 
> 
>
> Key: HDFS-9831
> URL: https://issues.apache.org/jira/browse/HDFS-9831
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Fix For: 2.8.0
>
> Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch, 
> HDFS-9831.002.patch, HDFS-9831.003.patch
>
>
> This ticket is opened to document the configuration keys introduced by 
> HDFS-5219/HDFS-5122 for WebHdfs Retry.  Both hdfs-default.xml and webhdfs.md 
> should be updated with the usage of these keys.
> {code}
> / WebHDFS retry policy
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = 
> "dfs.http.client.retry.policy.enabled";
>   public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = 
> false;
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = 
> "dfs.http.client.retry.policy.spec";
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = 
> "1,6,6,10"; //t1,n1,t2,n2,...
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.failover.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 
> 15;
>   public static final String  DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.retry.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = 
> "dfs.http.client.failover.sleep.base.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT 
> = 500;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = 
> "dfs.http.client.failover.sleep.max.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 
> 15000;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122

2016-02-25 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167566#comment-15167566
 ] 

Xiaoyu Yao commented on HDFS-9831:
--

Thanks [~xiaobingo] for the update. One more issue found on the rendered 
webhdfs.html (changes in webhdfs.md)

I don't think we should put "The following properties control WebHDFS retry and 
failover policy." and the retry keys under the "Cross-Site Request Forgery 
Prevention" section. Can you add this as a separate section like below? 

{code}

WebHDFS Retry Policy
-

WebHDFS supports an optional, configurable retry policy for resilient copy of 
large files that could timeout, or copy file between HA clusters that could 
failover during the copy.

The following properties control WebHDFS retry and failover policy.
...

{code}

> Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 
> 
>
> Key: HDFS-9831
> URL: https://issues.apache.org/jira/browse/HDFS-9831
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch
>
>
> This ticket is opened to document the configuration keys introduced by 
> HDFS-5219/HDFS-5122 for WebHdfs Retry.  Both hdfs-default.xml and webhdfs.md 
> should be updated with the usage of these keys.
> {code}
> / WebHDFS retry policy
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = 
> "dfs.http.client.retry.policy.enabled";
>   public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = 
> false;
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = 
> "dfs.http.client.retry.policy.spec";
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = 
> "1,6,6,10"; //t1,n1,t2,n2,...
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.failover.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 
> 15;
>   public static final String  DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.retry.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = 
> "dfs.http.client.failover.sleep.base.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT 
> = 500;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = 
> "dfs.http.client.failover.sleep.max.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 
> 15000;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9843:
-
Attachment: HDFS-9843.00.patch

Attach a initial patch. 

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9843.00.patch
>
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-22 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9843:
-
Status: Patch Available  (was: Open)

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9843.00.patch
>
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122

2016-02-25 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168107#comment-15168107
 ] 

Xiaoyu Yao commented on HDFS-9831:
--

[~xiaobingo], thanks for the update. We need to add an anchor for the new 
section in the top level directory. 
+1 after that being added.
{code}
...
* [Cross-Site Request Forgery 
Prevention](#Cross-Site_Request_Forgery_Prevention)
* [WebHDFS Retry Policy](#WebHDFS_Retry_Policy)
{code}

> Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 
> 
>
> Key: HDFS-9831
> URL: https://issues.apache.org/jira/browse/HDFS-9831
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch, 
> HDFS-9831.002.patch
>
>
> This ticket is opened to document the configuration keys introduced by 
> HDFS-5219/HDFS-5122 for WebHdfs Retry.  Both hdfs-default.xml and webhdfs.md 
> should be updated with the usage of these keys.
> {code}
> / WebHDFS retry policy
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = 
> "dfs.http.client.retry.policy.enabled";
>   public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = 
> false;
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = 
> "dfs.http.client.retry.policy.spec";
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = 
> "1,6,6,10"; //t1,n1,t2,n2,...
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.failover.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 
> 15;
>   public static final String  DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.retry.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = 
> "dfs.http.client.failover.sleep.base.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT 
> = 500;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = 
> "dfs.http.client.failover.sleep.max.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 
> 15000;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122

2016-02-24 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166336#comment-15166336
 ] 

Xiaoyu Yao commented on HDFS-9831:
--

Thanks [~xiaobingo] for working on this. The patch looks good to me. One 
suggestion: can you add some description of the use cases that need to enable 
the WebHDFS retry policy in hdfs-site.xml. For example,

If "true", enable the retry policy of WebHDFS client. This can be useful when 
using WebHDFS to 
 - copy large files between clusters that could timeout or 
 - copy files between HA clusters that could failover during the copy. 


{code}
2834
2835  dfs.http.client.retry.policy.enabled
2836  false
2837  
2838If "true", enable the retry policy of WebHDFS client.
2839If "false", retry policy is turned off.
2840  
2841
{code}

> Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 
> 
>
> Key: HDFS-9831
> URL: https://issues.apache.org/jira/browse/HDFS-9831
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, webhdfs
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9831.000.patch
>
>
> This ticket is opened to document the configuration keys introduced by 
> HDFS-5219/HDFS-5122 for WebHdfs Retry.  Both hdfs-default.xml and webhdfs.md 
> should be updated with the usage of these keys.
> {code}
> / WebHDFS retry policy
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = 
> "dfs.http.client.retry.policy.enabled";
>   public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = 
> false;
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = 
> "dfs.http.client.retry.policy.spec";
>   public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = 
> "1,6,6,10"; //t1,n1,t2,n2,...
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.failover.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 
> 15;
>   public static final String  DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = 
> "dfs.http.client.retry.max.attempts";
>   public static final int    DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = 
> "dfs.http.client.failover.sleep.base.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT 
> = 500;
>   public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = 
> "dfs.http.client.failover.sleep.max.millis";
>   public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 
> 15000;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9843) Document distcp options required for copying between encrypted locations

2016-02-24 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166339#comment-15166339
 ] 

Xiaoyu Yao commented on HDFS-9843:
--

Thank you, [~cnauroth] for reviewing and committing the patch!

> Document distcp options required for copying between encrypted locations
> 
>
> Key: HDFS-9843
> URL: https://issues.apache.org/jira/browse/HDFS-9843
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, documentation, encryption
>Affects Versions: 2.6.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Fix For: 2.8.0
>
> Attachments: HDFS-9843.00.patch, HDFS-9843.01.patch, 
> HDFS-9843.02.patch
>
>
> In TransparentEncryption.md#Distcp_considerations document section, we have 
> "Copying_between_encrypted_and_unencrypted_locations" which requires 
> -skipcrccheck and -update. 
> These options should be documented as required for "Copying between encrypted 
> locations" use cases as well because this involves decrypting source file and 
> encrypting destination file with a different EDEK, resulting in different 
> checksum at the destination. Distcp will fail at crc check if -skipcrccheck 
> if not specified.
> This ticket is opened to document the required options for "Copying between 
> encrypted locations" use cases when using distcp with HDFS encryption. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9667) StorageType: SSD precede over DISK

2016-01-20 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109184#comment-15109184
 ] 

Xiaoyu Yao commented on HDFS-9667:
--

Thanks [~aderen] for reporting this, this seems to be a dup of HDFS-8361.

> StorageType: SSD precede over DISK
> --
>
> Key: HDFS-9667
> URL: https://issues.apache.org/jira/browse/HDFS-9667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: ade
>Assignee: ade
> Fix For: 2.6.0
>
> Attachments: HDFS-9667.0.patch
>
>
> We enabled the heterogeneous storage in our cluster and there are only ~50% 
> of datanode & regionserver hosts with SSD. We set hfile with ONE_SSD 
> storagepolicy but we found block's all replica are DISK often even local host 
> with SSD storage. The block placement do not choose SSD first to place 
> replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-9667) StorageType: SSD precede over DISK

2016-01-20 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao resolved HDFS-9667.
--
Resolution: Fixed

> StorageType: SSD precede over DISK
> --
>
> Key: HDFS-9667
> URL: https://issues.apache.org/jira/browse/HDFS-9667
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: ade
>Assignee: ade
> Fix For: 2.6.0
>
> Attachments: HDFS-9667.0.patch
>
>
> We enabled the heterogeneous storage in our cluster and there are only ~50% 
> of datanode & regionserver hosts with SSD. We set hfile with ONE_SSD 
> storagepolicy but we found block's all replica are DISK often even local host 
> with SSD storage. The block placement do not choose SSD first to place 
> replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9799) Reimplement getCurrentTrashDir to remove incompatibility

2016-02-15 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148193#comment-15148193
 ] 

Xiaoyu Yao commented on HDFS-9799:
--

Thanks [~zhz] for reporting the issue and working on the fix. 

bq. The source of the IOException is from getEZForPath. So when getEZForPath 
gets an exception – meaning that the EZ of the given path cannot be determined 
at the time of calling, we should just return the Trash dir of the user's home. 
Even if the path does belong to an EZ, this will just mean the rm will fail 
later.

Can you elaborate when getEZForPath gets an IOException? Based on 
EncryptionZoneManager#getEZINodeForPath, getEZForPath() just returns null 
instead of throwing IOException when a given path cannot be determined to be 
inside a EZ. This makes DFS#getTrashRoots() to include just the Trash dir of 
the user's home as returned result for non-EZ paths. Should we just fix the 
annotations?

> Reimplement getCurrentTrashDir to remove incompatibility
> 
>
> Key: HDFS-9799
> URL: https://issues.apache.org/jira/browse/HDFS-9799
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Blocker
> Attachments: HDFS-9799.00.patch, HDFS-9799.01.patch, 
> HDFS-9799.02.patch, HDFS-9799.03.patch, HDFS-9799.04.patch
>
>
> HDFS-8831 changed the signature of {{TrashPolicy#getCurrentTrashDir}} by 
> adding an IOException. This breaks other applications using this public API. 
> This JIRA aims to reimplement the logic to safely handle the IOException 
> within HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9799) Reimplement getCurrentTrashDir to remove incompatibility

2016-02-17 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150698#comment-15150698
 ] 

Xiaoyu Yao commented on HDFS-9799:
--

Thanks [~zhz] for the explanation. Agree with your changes on 
getTrashRoot()/getTrashRoots level. For the change in getTrashRoots, can we add 
some indication for partial results being returned at API level in addition to 
the log.

> Reimplement getCurrentTrashDir to remove incompatibility
> 
>
> Key: HDFS-9799
> URL: https://issues.apache.org/jira/browse/HDFS-9799
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Blocker
> Attachments: HDFS-9799.00.patch, HDFS-9799.01.patch, 
> HDFS-9799.02.patch, HDFS-9799.03.patch, HDFS-9799.04.patch
>
>
> HDFS-8831 changed the signature of {{TrashPolicy#getCurrentTrashDir}} by 
> adding an IOException. This breaks other applications using this public API. 
> This JIRA aims to reimplement the logic to safely handle the IOException 
> within HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9711) Integrate CSRF prevention filter in WebHDFS.

2016-02-17 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151128#comment-15151128
 ] 

Xiaoyu Yao commented on HDFS-9711:
--

Thanks [~cnauroth] for working on this. The patch looks good to me +1. 
One NIT:  Can we move WebHdfsFileSystem#getTrimmedStringList() with default 
string support to StringUtils so that it can be used by configure keys similar 
to this?

> Integrate CSRF prevention filter in WebHDFS.
> 
>
> Key: HDFS-9711
> URL: https://issues.apache.org/jira/browse/HDFS-9711
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode, webhdfs
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-9711.001.patch, HDFS-9711.002.patch, 
> HDFS-9711.003.patch, HDFS-9711.004.patch, HDFS-9711.005.patch
>
>
> HADOOP-12691 introduced a filter in Hadoop Common to help REST APIs guard 
> against cross-site request forgery attacks.  This issue tracks integration of 
> that filter in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9711) Integrate CSRF prevention filter in WebHDFS.

2016-02-17 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151258#comment-15151258
 ] 

Xiaoyu Yao commented on HDFS-9711:
--

LGTM, Thanks [~cnauroth] for the clarification!

> Integrate CSRF prevention filter in WebHDFS.
> 
>
> Key: HDFS-9711
> URL: https://issues.apache.org/jira/browse/HDFS-9711
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode, webhdfs
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-9711.001.patch, HDFS-9711.002.patch, 
> HDFS-9711.003.patch, HDFS-9711.004.patch, HDFS-9711.005.patch
>
>
> HADOOP-12691 introduced a filter in Hadoop Common to help REST APIs guard 
> against cross-site request forgery attacks.  This issue tracks integration of 
> that filter in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122

2016-02-18 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-9831:


 Summary: Document webhdfs retry configuration keys introduced by 
HDFS-5219/HDFS-5122 
 Key: HDFS-9831
 URL: https://issues.apache.org/jira/browse/HDFS-9831
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, webhdfs
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao


This ticket is opened to document the configuration keys introduced by 
HDFS-5219/HDFS-5122 for WebHdfs Retry.  Both hdfs-default.xml and webhdfs.md 
should be updated with the usage of these keys.

{code}
/ WebHDFS retry policy
  public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = 
"dfs.http.client.retry.policy.enabled";
  public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = 
false;
  public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = 
"dfs.http.client.retry.policy.spec";
  public static final String  DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = 
"1,6,6,10"; //t1,n1,t2,n2,...
  public static final String  DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = 
"dfs.http.client.failover.max.attempts";
  public static final int    DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 15;
  public static final String  DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = 
"dfs.http.client.retry.max.attempts";
  public static final int    DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10;
  public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = 
"dfs.http.client.failover.sleep.base.millis";
  public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT = 
500;
  public static final String  DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = 
"dfs.http.client.failover.sleep.max.millis";
  public static final int    DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 
15000;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9839) Reduce verbosity of processReport logging

2016-02-20 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155858#comment-15155858
 ] 

Xiaoyu Yao commented on HDFS-9839:
--

Patch LGTM. +1.

> Reduce verbosity of processReport logging
> -
>
> Key: HDFS-9839
> URL: https://issues.apache.org/jira/browse/HDFS-9839
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-9839.01.patch
>
>
> {{BlockManager#processReport}} logs one line for each invalidated block at 
> INFO. HDFS-7503 moved this logging outside the NameSystem write lock but we 
> still see the NameNode being slowed down when the number of block 
> invalidations is very large e.g. just after a large amount of data is deleted.
> {code}
>   for (Block b : invalidatedBlocks) {
> blockLog.info("BLOCK* processReport: {} on node {} size {} does not " 
> +
> "belong to any file", b, node, b.getNumBytes());
>   }
> {code}
> We can change this statement to DEBUG and just log the number of block 
> invalidations at INFO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174487#comment-15174487
 ] 

Xiaoyu Yao commented on HDFS-9881:
--

Thanks [~andrew.wang]. Patch LGTM, +1.  

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-03-24 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10207:
-

 Summary: Support enable Hadoop IPC backoff without namenode restart
 Key: HDFS-10207
 URL: https://issues.apache.org/jira/browse/HDFS-10207
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Xiaoyu Yao
Assignee: Xiaobing Zhou


It will be useful to allow changing {{ipc.8020.backoff.enable}} without a 
namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode

2016-03-24 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10209:
-

 Summary: Support enable caller context in HDFS namenode audit log 
without restart namenode
 Key: HDFS-10209
 URL: https://issues.apache.org/jira/browse/HDFS-10209
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Xiaoyu Yao
Assignee: Xiaobing Zhou


RPC caller context is a useful feature to track down the origin of the caller, 
which can track down "bad" jobs that overload the namenode. This ticket is 
opened to allow enabling caller context without namenode restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart

2016-03-24 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10207:
--
Description: It will be useful to allow changing 
{{ipc.#port#.backoff.enable}} without a namenode restart to protect namenode 
from being overloaded.  (was: It will be useful to allow changing 
{{ipc.8020.backoff.enable}} without a namenode restart to protect namenode from 
being overloaded.)

> Support enable Hadoop IPC backoff without namenode restart
> --
>
> Key: HDFS-10207
> URL: https://issues.apache.org/jira/browse/HDFS-10207
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
>
> It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a 
> namenode restart to protect namenode from being overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode

2016-03-24 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210805#comment-15210805
 ] 

Xiaoyu Yao commented on HDFS-10209:
---

The configuration key is {{hadoop.caller.context.enabled}} that is {{false}} by 
default.

> Support enable caller context in HDFS namenode audit log without restart 
> namenode
> -
>
> Key: HDFS-10209
> URL: https://issues.apache.org/jira/browse/HDFS-10209
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
>
> RPC caller context is a useful feature to track down the origin of the 
> caller, which can track down "bad" jobs that overload the namenode. This 
> ticket is opened to allow enabling caller context without namenode restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-02 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9887:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Thanks [~and1000] and [~chris.douglas] for the contribution. I've commit the 
patch to trunk, branch-2 and branch2.8.

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-02 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao moved HADOOP-12827 to HDFS-9887:
---

Target Version/s: 2.8.0  (was: 2.9.0)
 Component/s: (was: fs)
  webhdfs
  fs
 Key: HDFS-9887  (was: HADOOP-12827)
 Project: Hadoop HDFS  (was: Hadoop Common)

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183301#comment-15183301
 ] 

Xiaoyu Yao commented on HDFS-9887:
--

Thanks [~jojochuang] for reporting this. Further reading found that the webhdfs 
specific read/connect timeout implemented by HDFS-9887 should not affect other 
callers of {{URLConnectionFactory.newSslConnConfigurator()}} such as 
{{QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
TransferFsImage()}}. I will file separate ticket to fix it. 

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-9914:


Assignee: Xiaoyu Yao

> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9914:
-
Description: 
Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened 
to fix the following issues in current implementation:

1. The webhdfs read/connect timeout should not affect connection for other 
callers of URLConnectionFactory.newSslConnConfigurator() such as 
QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()

2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
connect/read timeout even if any exception is thrown during customized SSL 
configuration. 
 
3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.

  was:
Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened 
to fix the following issues in current implementation:

1. The webhdfs read/connect timeout should not affect connection for other 
callers of URLConnectionFactory.newSslConnConfigurator() such as 
QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()

2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
connect/read timeout even if any exception is thrown during customized SSL 
configuration. 
 


> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  
> 3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9914:
-
Status: Patch Available  (was: Open)

> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9914.001.patch
>
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  
> 3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9914:
-
Attachment: HDFS-9914.001.patch

> Fix configurable WebhDFS connect/read timeout
> -
>
> Key: HDFS-9914
> URL: https://issues.apache.org/jira/browse/HDFS-9914
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
> Attachments: HDFS-9914.001.patch
>
>
> Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is 
> opened to fix the following issues in current implementation:
> 1. The webhdfs read/connect timeout should not affect connection for other 
> callers of URLConnectionFactory.newSslConnConfigurator() such as 
> QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and 
> TransferFsImage()
> 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
> connect/read timeout even if any exception is thrown during customized SSL 
> configuration. 
>  
> 3.  OAuth2 webhdfs connection should honor the webhdfs connect/read timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9914) Fix configurable WebhDFS connect/read timeout

2016-03-07 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-9914:


 Summary: Fix configurable WebhDFS connect/read timeout
 Key: HDFS-9914
 URL: https://issues.apache.org/jira/browse/HDFS-9914
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiaoyu Yao


Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened 
to fix the following issues in current implementation:

1. The webhdfs read/connect timeout should not affect connection for other 
callers of URLConnectionFactory.newSslConnConfigurator() such as 
QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()

2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs 
connect/read timeout even if any exception is thrown during customized SSL 
configuration. 
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183322#comment-15183322
 ] 

Xiaoyu Yao commented on HDFS-9887:
--

Filed HDFS-9914 for the fix. 

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable

2016-03-07 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183241#comment-15183241
 ] 

Xiaoyu Yao commented on HDFS-9887:
--

Agree, this is a bug. Webhdfs with ssl configuration exception will not honor 
the configurable webhdfs connect/read timeout. It will always be 
{{DEFAULT_TIMEOUT_CONN_CONFIGURATOR}} the default value (1 min). 

> WebHdfs socket timeouts should be configurable
> --
>
> Key: HDFS-9887
> URL: https://issues.apache.org/jira/browse/HDFS-9887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, webhdfs
> Environment: all
>Reporter: Austin Donnelly
>Assignee: Austin Donnelly
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, 
> HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, 
> HADOOP-12827.004.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> WebHdfs client connections use sockets with fixed timeouts of 60 seconds to 
> connect, and 60 seconds for reads.
> This is a problem because I am trying to use WebHdfs to access an archive 
> storage system which can take minutes to hours to return the requested data 
> over WebHdfs.
> The fix is to add new configuration file options to allow these 60s defaults 
> to be customised in hdfs-site.xml.
> If the new configuration options are not present, the behavior is unchanged 
> from before.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9723) Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext

2016-03-07 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-9723:
-
Description: 
HDFS namenode handles RPC requests from DFS clients and internal processing 
from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 
is the one of the existing efforts added since Hadoop 2.4 to address this 
issue. 

In current FCQ implementation, incoming RPC calls are scheduled based on the 
number of recent RPC calls of different users with a time-decayed scheduler. 
This works well when there is a clear mapping between users and their RPC calls 
from different jobs. However, this may not work effectively when it is hard to 
track calls to a specific caller in a chain of operations from the workflow 
(e.g.Oozie -> Hive -> Yarn). It is not feasible for operators/administrators to 
throttle all the hive jobs because of one “bad” query.

This JIRA proposed to leverage RPC caller context information (such as 
callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative 
to existing UGI (or user name when delegation token is not available) based 
Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue 
(HADOOP-9640) for better namenode throttling in multi-tenancy cluster 
deployment.  

  was:
HDFS namenode handles RPC requests from DFS clients and internal processing 
from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 
is the one of the existing efforts added since Hadoop 2.4 to address this 
issue. 

In current FCQ implementation, incoming RPC calls are scheduled based on the 
number of recent RPC calls (1000) of different users with a time-decayed 
scheduler. This works well when there is a clear mapping between users and 
their RPC calls from different jobs. However, this may not work effectively 
when it is hard to track calls to a specific caller in a chain of operations 
from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for 
operators/administrators to throttle all the hive jobs because of one “bad” 
query.

This JIRA proposed to leverage RPC caller context information (such as 
callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative 
to existing UGI (or user name when delegation token is not available) based 
Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue 
(HADOOP-9640) for better namenode throttling in multi-tenancy cluster 
deployment.  


> Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext
> ---
>
> Key: HDFS-9723
> URL: https://issues.apache.org/jira/browse/HDFS-9723
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>
> HDFS namenode handles RPC requests from DFS clients and internal processing 
> from datanodes. It has been a recurring pain that some bad jobs overwhelm the 
> namenode and bring the whole cluster down. FCQ (Fair Call Queue) by 
> HADOOP-9640 is the one of the existing efforts added since Hadoop 2.4 to 
> address this issue. 
> In current FCQ implementation, incoming RPC calls are scheduled based on the 
> number of recent RPC calls of different users with a time-decayed scheduler. 
> This works well when there is a clear mapping between users and their RPC 
> calls from different jobs. However, this may not work effectively when it is 
> hard to track calls to a specific caller in a chain of operations from the 
> workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for 
> operators/administrators to throttle all the hive jobs because of one “bad” 
> query.
> This JIRA proposed to leverage RPC caller context information (such as 
> callerType: caller Id from TEZ-2851) available with HDFS-9184 as an 
> alternative to existing UGI (or user name when delegation token is not 
> available) based Identify Provider to improve effectiveness Hadoop RPC Fair 
> Call Queue (HADOOP-9640) for better namenode throttling in multi-tenancy 
> cluster deployment.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10253) Fix TestRefreshCallQueue failure.

2016-04-02 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10253:
--
Attachment: HDFS-10253.00.patch

Thanks [~brahmareddy] for catching this. Attach a simple fix for reference if 
you have not started working on it .

> Fix TestRefreshCallQueue failure.
> -
>
> Key: HDFS-10253
> URL: https://issues.apache.org/jira/browse/HDFS-10253
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10253.00.patch
>
>
>  *Jenkins link* 
> https://builds.apache.org/job/PreCommit-HDFS-Build/15041/testReport/
>  *Trace* 
> {noformat}
> java.lang.RuntimeException: 
> org.apache.hadoop.TestRefreshCallQueue$MockCallQueue could not be constructed.
>   at 
> org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:164)
>   at 
> org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:70)
>   at org.apache.hadoop.ipc.Server.(Server.java:2579)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:421)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:759)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:701)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:900)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:879)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1596)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
>   at 
> org.apache.hadoop.TestRefreshCallQueue.setUp(TestRefreshCallQueue.java:71)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-10253) Fix TestRefreshCallQueue failure.

2016-04-02 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-10253:
--
Status: Patch Available  (was: Open)

> Fix TestRefreshCallQueue failure.
> -
>
> Key: HDFS-10253
> URL: https://issues.apache.org/jira/browse/HDFS-10253
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-10253.00.patch
>
>
>  *Jenkins link* 
> https://builds.apache.org/job/PreCommit-HDFS-Build/15041/testReport/
>  *Trace* 
> {noformat}
> java.lang.RuntimeException: 
> org.apache.hadoop.TestRefreshCallQueue$MockCallQueue could not be constructed.
>   at 
> org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:164)
>   at 
> org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:70)
>   at org.apache.hadoop.ipc.Server.(Server.java:2579)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:421)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:759)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:701)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:900)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:879)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1596)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441)
>   at 
> org.apache.hadoop.TestRefreshCallQueue.setUp(TestRefreshCallQueue.java:71)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode

2016-03-29 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217025#comment-15217025
 ] 

Xiaoyu Yao commented on HDFS-10209:
---

Thanks [~xiaobingo] for working on this. Patch looks good to me. +1 pending 
Jenkins.

> Support enable caller context in HDFS namenode audit log without restart 
> namenode
> -
>
> Key: HDFS-10209
> URL: https://issues.apache.org/jira/browse/HDFS-10209
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10209-HDFS-9000.000.patch
>
>
> RPC caller context is a useful feature to track down the origin of the 
> caller, which can track down "bad" jobs that overload the namenode. This 
> ticket is opened to allow enabling caller context without namenode restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties

2016-04-13 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240631#comment-15240631
 ] 

Xiaoyu Yao commented on HDFS-10286:
---

Patch looks good to me. +1 pending Jenkins.

> Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
> 
>
> Key: HDFS-10286
> URL: https://issues.apache.org/jira/browse/HDFS-10286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10286.000.patch
>
>
> HDFS-10209 introduced a new reconfigurable properties which requires an 
> update to the validation in 
> TestDFSAdmin#testNameNodeGetReconfigurableProperties. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode

2016-04-13 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240592#comment-15240592
 ] 

Xiaoyu Yao commented on HDFS-10209:
---

I opened HDFS-10286 and attached your patch to it. 

> Support enable caller context in HDFS namenode audit log without restart 
> namenode
> -
>
> Key: HDFS-10209
> URL: https://issues.apache.org/jira/browse/HDFS-10209
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-10209-HDFS-9000.000.patch, 
> HDFS-10209-HDFS-9000.001.patch, HDFS-10209-HDFS-9000.UT-fix.patch
>
>
> RPC caller context is a useful feature to track down the origin of the 
> caller, which can track down "bad" jobs that overload the namenode. This 
> ticket is opened to allow enabling caller context without namenode restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode

2016-04-13 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240588#comment-15240588
 ] 

Xiaoyu Yao commented on HDFS-10209:
---

[~xiaobingo], please open a separate ticket for the unit test fix and link it 
to HDFS-10209. Thanks

> Support enable caller context in HDFS namenode audit log without restart 
> namenode
> -
>
> Key: HDFS-10209
> URL: https://issues.apache.org/jira/browse/HDFS-10209
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Xiaobing Zhou
> Fix For: 2.9.0
>
> Attachments: HDFS-10209-HDFS-9000.000.patch, 
> HDFS-10209-HDFS-9000.001.patch, HDFS-10209-HDFS-9000.UT-fix.patch
>
>
> RPC caller context is a useful feature to track down the origin of the 
> caller, which can track down "bad" jobs that overload the namenode. This 
> ticket is opened to allow enabling caller context without namenode restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties

2016-04-13 Thread Xiaoyu Yao (JIRA)

Xiaoyu Yao created HDFS-10286:
-

 Summary: Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
 Key: HDFS-10286
 URL: https://issues.apache.org/jira/browse/HDFS-10286
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Xiaoyu Yao
Assignee: Xiaobing Zhou


HDFS-10209 introduced a new reconfigurable properties which requires an update 
to the validation in TestDFSAdmin#testNameNodeGetReconfigurableProperties. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-25 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256633#comment-15256633
 ] 

Xiaoyu Yao commented on HDFS-10324:
---

[~jojochuang], I mentioned #2 because Trash is client feature that used to not 
require file system operation like {{hdfs dfs -mkdir /ez/tmp; hdfs dfs -chmod 
1777 /ez/tmp}} for its operation. Since this is encryption zone specific, it 
might be easier to have a single crytoadmin cmd to handle it.

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 3526 matches

Mail list logo