[jira] [Commented] (HDFS-9501) TestBalancer#testBalancerWithPinnedBlocks fails in branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037997#comment-15037997 ] Xiaoyu Yao commented on HDFS-9501: -- Thanks [~brahmareddy] for working on this. Patch LGTM and I've tested it locally with branch-2.7. +1 > TestBalancer#testBalancerWithPinnedBlocks fails in branch-2.7 > - > > Key: HDFS-9501 > URL: https://issues.apache.org/jira/browse/HDFS-9501 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-9501-branch-2.7.patch > > > As [~xyao] pointed in HDFS-9083..which is failing after HDFS-9083. > {noformat} > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; > support was removed in 8.0 > Running org.apache.hadoop.hdfs.server.balancer.TestBalancer > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.888 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.balancer.TestBalancer > testBalancerWithPinnedBlocks(org.apache.hadoop.hdfs.server.balancer.TestBalancer) > Time elapsed: 12.748 sec <<< FAILURE! > java.lang.AssertionError: expected:<-3> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithPinnedBlocks(TestBalancer.java:362) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone
[ https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8831: - Attachment: HDFS-8831.05.patch Thanks [~arpitagarwal] for the review. Patch v05 addresses the latest review comments. > Trash Support for deletion in HDFS encryption zone > -- > > Key: HDFS-8831 > URL: https://issues.apache.org/jira/browse/HDFS-8831 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, > HDFS-8831.01.patch, HDFS-8831.02.patch, HDFS-8831.03.patch, > HDFS-8831.04.patch, HDFS-8831.05.patch > > > Currently, "Soft Delete" is only supported if the whole encryption zone is > deleted. If you delete files whinin the zone with trash feature enabled, you > will get error similar to the following > {code} > rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: > /z1_1/startnn.sh can't be moved from an encryption zone. > {code} > With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of > the file being deleted appropriately to the same encryption zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone
[ https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8831: - Release Note: Trash is now supported for deletion of files within encryption zone after HDFS-8831. The deleted encrypted files will remain encrypted and be moved to .Trash subdirectory under the root of the encryption zone prefixed by $USER/current with checkpoint and expunge working similar to existing Trash. > Trash Support for deletion in HDFS encryption zone > -- > > Key: HDFS-8831 > URL: https://issues.apache.org/jira/browse/HDFS-8831 > Project: Hadoop HDFS > Issue Type: Improvement > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, > HDFS-8831.01.patch, HDFS-8831.02.patch, HDFS-8831.03.patch, > HDFS-8831.04.patch, HDFS-8831.05.patch > > > Currently, "Soft Delete" is only supported if the whole encryption zone is > deleted. If you delete files whinin the zone with trash feature enabled, you > will get error similar to the following > {code} > rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: > /z1_1/startnn.sh can't be moved from an encryption zone. > {code} > This JIRA is proposed to support trash for deletion of files within > encryption zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone
[ https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8831: - Attachment: HDFS-8831.03.patch Update patch v03 based on [~arpit99]'s feedback. Please review, thanks! bq. DistributedFileSystem.java:2326: We can skip the call to dfs.getEZForPath if isHDFSEncryptionEnabled is false to avoid extra RPC call when TDE is not enabled. Good point. Fixed. bq. FileSystem.java:2701: Can we define .Trash as a constant somewhere? Add FileSystem#TRASH_PREFIX for ".Trash" bq.Trash.java:98: Avoid extra RPC for log statement. Can we cache the currentTrashDir some time earlier? Every path to be deleted may have different currentTrashDir. Move the INFO log to TrashPolicyDefault.java to avoid extra RPC for log. bq. TrashPolicy.java:48: I don't think we should mark it as deprecated. While the TrashPolicyDefault no longer uses the home parameter other implementations may be passing a different value here in theory. TrashPolicy.java:57: Also we should have a default implementation of this routine else it will be a backward incompatible change (will break existing implementations of this public interface). TrashPolicy.java:83: Need default implementation. It can just throw UnsupportedOperationException which should be handled by the caller. TrashPolicy.java:92: Need default implementation. It can just throw UnsupportedOperationException which should be handled by the caller. Agree and fixed. bq. TrashPolicy.java:108: We should leave the old method in place to keep the public interface backwards compatible. Perhaps to be conservative we should respect the 'home' parameter if one is passed in instead of using Filesystem#getTrashRoot? Agree and fixed. > Trash Support for deletion in HDFS encryption zone > -- > > Key: HDFS-8831 > URL: https://issues.apache.org/jira/browse/HDFS-8831 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, > HDFS-8831.01.patch, HDFS-8831.02.patch, HDFS-8831.03.patch > > > Currently, "Soft Delete" is only supported if the whole encryption zone is > deleted. If you delete files whinin the zone with trash feature enabled, you > will get error similar to the following > {code} > rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: > /z1_1/startnn.sh can't be moved from an encryption zone. > {code} > With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of > the file being deleted appropriately to the same encryption zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9528) Cleanup namenode audit/log/exception messages
[ https://issues.apache.org/jira/browse/HDFS-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051723#comment-15051723 ] Xiaoyu Yao commented on HDFS-9528: -- +1 for h9528_20151210.patch > Cleanup namenode audit/log/exception messages > - > > Key: HDFS-9528 > URL: https://issues.apache.org/jira/browse/HDFS-9528 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Attachments: h9528_20151208.patch, h9528_20151210.patch > > > - Cleanup unnecessary long methods for constructing message strings. > - Avoid calling toString() methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8326) Documentation about when checkpoints are run is out of date
[ https://issues.apache.org/jira/browse/HDFS-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052978#comment-15052978 ] Xiaoyu Yao commented on HDFS-8326: -- Good catch, [~iwasakims]. I will cherry-pick the fix to branch-2. > Documentation about when checkpoints are run is out of date > --- > > Key: HDFS-8326 > URL: https://issues.apache.org/jira/browse/HDFS-8326 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.3.0 >Reporter: Misty Stanley-Jones >Assignee: Misty Stanley-Jones > Fix For: 2.8.0 > > Attachments: HDFS-8326.001.patch, HDFS-8326.002.patch, > HDFS-8326.003.patch, HDFS-8326.004.patch, HDFS-8326.patch > > > Apparently checkpointing by interval or transaction size are both supported > in at least HDFS 2.3, but the documentation does not reflect this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8785) TestDistributedFileSystem is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056249#comment-15056249 ] Xiaoyu Yao commented on HDFS-8785: -- [~yzhangal], Thanks for committing this to branch-2/branch-2.8! > TestDistributedFileSystem is failing in trunk > - > > Key: HDFS-8785 > URL: https://issues.apache.org/jira/browse/HDFS-8785 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0, 2.8.0 >Reporter: Arpit Agarwal >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8785.00.patch, HDFS-8785.01.patch, > HDFS-8785.02.patch > > > A newly added test case > {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in > trunk. > e.g. run > https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9530) huge Non-DFS Used in hadoop 2.6.2 & 2.7.1
[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049010#comment-15049010 ] Xiaoyu Yao commented on HDFS-9530: -- This looks like the symptom of HDFS-8072, where RBW reserved space is not released when Datanode BlockReceiver encounters an IOException. The space won't be releases until DN restart. The fix should be included in hadoop 2.6.2 and 2.7.1. Can you post the "hadoop version" command output? > huge Non-DFS Used in hadoop 2.6.2 & 2.7.1 > - > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Fei Hui > > i run a hive job, and errors are as follow > === > Diagnostic Messages for this Task: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"k":"1","v":1} > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row {"k":"1","v":1} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.17_3 > could only be replicated to 0 nodes instead of minReplication (=1). There > are 3 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3245) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:663) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:787) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) > ... 9 more > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.17_3 > could only be replicated to 0 nodes instead of minReplication (=1). There > are 3 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562) > at >
[jira] [Commented] (HDFS-9625) set replication for empty file failed when set storage policy
[ https://issues.apache.org/jira/browse/HDFS-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087946#comment-15087946 ] Xiaoyu Yao commented on HDFS-9625: -- [~Deng FEI], thanks for reporting the issue and attach the fix. Can you rebase the patch against the latest trunk as it won't apply? > set replication for empty file failed when set storage policy > -- > > Key: HDFS-9625 > URL: https://issues.apache.org/jira/browse/HDFS-9625 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: DENG FEI >Assignee: DENG FEI > Attachments: patch_HDFS-9625.20160107 > > > When setReplication, the FSDirectory#updateCount need calculate the > related storageTypes quota,but will check the file consume the ds quota is > positive. > Actually,it's may set replication after create file,like > JobSplitWriter#createSplitFiles. > It's also can reproduce on command shell: > 1. hdfs storagepolicies -setStoragePolicy -path /tmp -policy HOT > 2. hdfs dfs -touchz /tmp/test > 3. hdfs dfs -setrep 5 /tmp/test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.
[ https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066913#comment-15066913 ] Xiaoyu Yao commented on HDFS-9584: -- Thanks [~surendrasingh] for working on this. Patch LGTM. +1 pending Jenkins. > NPE in distcp when ssl configuration file does not exist in class path. > --- > > Key: HDFS-9584 > URL: https://issues.apache.org/jira/browse/HDFS-9584 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Attachments: HDFS-9584.patch > > > {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml > hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat} > if {{ssl-distcp.xml}} file not exist in class path, distcp will throw > NullPointerException. > {code} > java.lang.NullPointerException > at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266) > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:127) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:431) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8855) Webhdfs client leaks active NameNode connections
[ https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8855: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~xiaobingo] and [~cnauroth] for the contribution and all for the reviews. I've committed the patch to trunk and branch-2. > Webhdfs client leaks active NameNode connections > > > Key: HDFS-8855 > URL: https://issues.apache.org/jira/browse/HDFS-8855 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Bob Hansen >Assignee: Xiaobing Zhou > Fix For: 2.8.0 > > Attachments: HDFS-8855.005.patch, HDFS-8855.006.patch, > HDFS-8855.007.patch, HDFS-8855.008.patch, HDFS-8855.009.patch, > HDFS-8855.1.patch, HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, > HDFS_8855.prototype.patch > > > The attached script simulates a process opening ~50 files via webhdfs and > performing random reads. Note that there are at most 50 concurrent reads, > and all webhdfs sessions are kept open. Each read is ~64k at a random > position. > The script periodically (once per second) shells into the NameNode and > produces a summary of the socket states. For my test cluster with 5 nodes, > it took ~30 seconds for the NameNode to have ~25000 active connections and > fails. > It appears that each request to the webhdfs client is opening a new > connection to the NameNode and keeping it open after the request is complete. > If the process continues to run, eventually (~30-60 seconds), all of the > open connections are closed and the NameNode recovers. > This smells like SoftReference reaping. Are we using SoftReferences in the > webhdfs client to cache NameNode connections but never re-using them? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8512) storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
[ https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8512: - Attachment: HDFS-8512.01.patch Thanks [~szetszwo] for the review. Rebase the patch to trunk. > storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > -- > > Key: HDFS-8512 > URL: https://issues.apache.org/jira/browse/HDFS-8512 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Sumana Sathish >Assignee: Xiaoyu Yao > Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch > > > Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > {code} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY="; > Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA="; > Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9210) Fix some misuse of %n in VolumeScanner#printStats
[ https://issues.apache.org/jira/browse/HDFS-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027437#comment-15027437 ] Xiaoyu Yao commented on HDFS-9210: -- [~andrew.wang], [~templedf], can you help reviewing the patch v02 that fixes the issue [~templedf] pointed out? > Fix some misuse of %n in VolumeScanner#printStats > - > > Key: HDFS-9210 > URL: https://issues.apache.org/jira/browse/HDFS-9210 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9210.00.patch, HDFS-9210.01.patch > > > Found 2 extra "%n" in the VolumeScanner report and lines not well formatted > below. This JIRA is opened to fix the format issue. > {code} > Block scanner information for volume DS-93fb2503-de00-4f98-a8bc-c2bc13b8f0f7 > with base path /hadoop/hdfs/data%nBytes verified in last hour : > 136882014 > Blocks scanned in current period : > 5 > Blocks scanned since restart : > 5 > Block pool scans since restart: > 0 > Block scan errors since restart : > 0 > Hours until next block pool scan : > 476.000 > Last block scanned: > BP-1792969149-192.168.70.101-1444150984999:blk_1073742088_1274 > More blocks to scan in period : > false > %n > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8485) Transparent Encryption Fails to work with Yarn/MapReduce
[ https://issues.apache.org/jira/browse/HDFS-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027466#comment-15027466 ] Xiaoyu Yao commented on HDFS-8485: -- [~PrasadAlle], Do you have *dfs.encryption.key.provider.uri* configured in your hdfs-site.xml? It should be something like: {code} On hdfs-site.xml: Key: dfs.encryption.key.provider.uri Value: kms://http@myhost.mydomain:16000/kms {code} > Transparent Encryption Fails to work with Yarn/MapReduce > > > Key: HDFS-8485 > URL: https://issues.apache.org/jira/browse/HDFS-8485 > Project: Hadoop HDFS > Issue Type: Bug > Environment: RHEL-7, Kerberos 5 >Reporter: Ambud Sharma >Priority: Critical > Attachments: core-site.xml, hdfs-site.xml, kms-site.xml, > mapred-site.xml, yarn-site.xml > > > Running a simple MapReduce job that writes to a path configured as an > encryption zone throws exception > 11:26:26,343 INFO [org.apache.hadoop.mapreduce.Job] (pool-14-thread-1) Task > Id : attempt_1432740034176_0001_m_00_2, Status : FAILED > 11:26:26,346 ERROR [stderr] (pool-14-thread-1) Error: java.io.IOException: > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:424) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:710) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1358) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1457) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1442) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:400) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 11:26:26,346 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > com.s3.ingestion.S3ImportMR$S3ImportMapper.map(S3ImportMR.java:112) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > com.s3.ingestion.S3ImportMR$S3ImportMapper.map(S3ImportMR.java:43) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > java.security.AccessController.doPrivileged(Native Method) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > javax.security.auth.Subject.doAs(Subject.java:422) > 11:26:26,347 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > 11:26:26,348 ERROR [stderr] (pool-14-thread-1)at > org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > 11:26:26,348 ERROR [stderr] (pool-14-thread-1) Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt) >
[jira] [Updated] (HDFS-8512) storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS
[ https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8512: - Issue Type: Improvement (was: Bug) > storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > -- > > Key: HDFS-8512 > URL: https://issues.apache.org/jira/browse/HDFS-8512 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Reporter: Sumana Sathish >Assignee: Xiaoyu Yao > Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch > > > Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > {code} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY="; > Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA="; > Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8512) WebHDFS : GETFILESTATUS should include storage type in LocatedBlock
[ https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8512: - Summary: WebHDFS : GETFILESTATUS should include storage type in LocatedBlock (was: storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS) > WebHDFS : GETFILESTATUS should include storage type in LocatedBlock > --- > > Key: HDFS-8512 > URL: https://issues.apache.org/jira/browse/HDFS-8512 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Reporter: Sumana Sathish >Assignee: Xiaoyu Yao > Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch > > > Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > {code} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY="; > Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA="; > Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8512) WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info
[ https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8512: - Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks [~szetszwo] for the review! I've committed the patch to trunk and branch-2. > WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info > - > > Key: HDFS-8512 > URL: https://issues.apache.org/jira/browse/HDFS-8512 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Reporter: Sumana Sathish >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch > > > Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > {code} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY="; > Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA="; > Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9210) Fix some misuse of %n in VolumeScanner#printStats
[ https://issues.apache.org/jira/browse/HDFS-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9210: - Attachment: HDFS-9210.02.patch Thanks [~templedf] for the review! Attached a patch using System.LineSeparator(). > Fix some misuse of %n in VolumeScanner#printStats > - > > Key: HDFS-9210 > URL: https://issues.apache.org/jira/browse/HDFS-9210 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9210.00.patch, HDFS-9210.01.patch, > HDFS-9210.02.patch > > > Found 2 extra "%n" in the VolumeScanner report and lines not well formatted > below. This JIRA is opened to fix the format issue. > {code} > Block scanner information for volume DS-93fb2503-de00-4f98-a8bc-c2bc13b8f0f7 > with base path /hadoop/hdfs/data%nBytes verified in last hour : > 136882014 > Blocks scanned in current period : > 5 > Blocks scanned since restart : > 5 > Block pool scans since restart: > 0 > Block scan errors since restart : > 0 > Hours until next block pool scan : > 476.000 > Last block scanned: > BP-1792969149-192.168.70.101-1444150984999:blk_1073742088_1274 > More blocks to scan in period : > false > %n > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8512) WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info
[ https://issues.apache.org/jira/browse/HDFS-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8512: - Summary: WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info (was: WebHDFS : GETFILESTATUS should include storage type in LocatedBlock) > WebHDFS : GETFILESTATUS should return LocatedBlock with storage type info > - > > Key: HDFS-8512 > URL: https://issues.apache.org/jira/browse/HDFS-8512 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Reporter: Sumana Sathish >Assignee: Xiaoyu Yao > Attachments: HDFS-8512.00.patch, HDFS-8512.01.patch > > > Storage type inside LocatedBlock object is not fully exposed for GETFILESTATUS > {code} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GETFILESTATUS; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:13 GMT > Date: Wed, 27 May 2015 18:04:13 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785853423=W4O5kKiYHmzzey4h7I9J9eL9EMY="; > Path=/; Expires=Thu, 28-May-2015 04:04:13 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"FileStatus":{"accessTime":1432683737985,"blockSize":134217728,"childrenNum":0,"fileId":16405,"group":"hadoop","length":150318178,"modificationTime":1432683738427,"owner":"xyao","pathSuffix":"","permission":"644","replication":1,"storagePolicy":7,"type":"FILE"}} > $ curl -i > "http://127.0.0.1:50070/webhdfs/v1/HOT/FILE1?user.name=xyao=GET_BLOCK_LOCATIONS=0=150318178; > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Expires: Wed, 27 May 2015 18:04:55 GMT > Date: Wed, 27 May 2015 18:04:55 GMT > Pragma: no-cache > Content-Type: application/json > Set-Cookie: > hadoop.auth="u=xyao=xyao=simple=1432785895031=TUiaNsCrARAPKz6xrddoQ1eHOXA="; > Path=/; Expires=Thu, 28-May-2015 04:04:55 GMT; HttpOnly > Transfer-Encoding: chunked > Server: Jetty(6.1.26) > > {"LocatedBlocks":{"fileLength":150318178,"isLastBlockComplete":true,"isUnderConstruction":false,"lastLocatedBlock":{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728},"locatedBlocks":[{"block":{"blockId":1073741846,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1022,"numBytes":134217728},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":0},{"block":{"blockId":1073741847,"blockPoolId":"BP-474445704-192.168.70.1-1432674221011","generationStamp":1023,"numBytes":16100450},"blockToken":{"urlString":"AA"},"cachedLocations":[],"isCorrupt":false,"locations":[{"adminState":"NORMAL","blockPoolUsed":300670976,"cacheCapacity":0,"cacheUsed":0,"capacity":1996329943040,"dfsUsed":300670976,"hostName":"192.168.70.1","infoPort":50075,"infoSecurePort":0,"ipAddr":"192.168.70.1","ipcPort":50020,"lastUpdate":1432749892058,"lastUpdateMonotonic":1432749892058,"name":"192.168.70.1:50010","networkLocation":"/default-rack","remaining":782138327040,"storageID":"49a30d0f-99f8-4b87-b986-502fe926271a","xceiverCount":1,"xferPort":50010}],"startOffset":134217728}]}} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8855) Webhdfs client leaks active NameNode connections
[ https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023292#comment-15023292 ] Xiaoyu Yao commented on HDFS-8855: -- Thanks [~xiaobingo] for updating the patch. +1 for the latest patch. I will commit it shortly. > Webhdfs client leaks active NameNode connections > > > Key: HDFS-8855 > URL: https://issues.apache.org/jira/browse/HDFS-8855 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Bob Hansen >Assignee: Xiaobing Zhou > Fix For: 2.8.0 > > Attachments: HDFS-8855.005.patch, HDFS-8855.006.patch, > HDFS-8855.007.patch, HDFS-8855.008.patch, HDFS-8855.009.patch, > HDFS-8855.1.patch, HDFS-8855.2.patch, HDFS-8855.3.patch, HDFS-8855.4.patch, > HDFS_8855.prototype.patch > > > The attached script simulates a process opening ~50 files via webhdfs and > performing random reads. Note that there are at most 50 concurrent reads, > and all webhdfs sessions are kept open. Each read is ~64k at a random > position. > The script periodically (once per second) shells into the NameNode and > produces a summary of the socket states. For my test cluster with 5 nodes, > it took ~30 seconds for the NameNode to have ~25000 active connections and > fails. > It appears that each request to the webhdfs client is opening a new > connection to the NameNode and keeping it open after the request is complete. > If the process continues to run, eventually (~30-60 seconds), all of the > open connections are closed and the NameNode recovers. > This smells like SoftReference reaping. Are we using SoftReferences in the > webhdfs client to cache NameNode connections but never re-using them? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.
[ https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9584: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks [~surendrasingh] for the contribution and all for the reviews. I've commit the change to trunk, branch-2 and branch-2.8. > NPE in distcp when ssl configuration file does not exist in class path. > --- > > Key: HDFS-9584 > URL: https://issues.apache.org/jira/browse/HDFS-9584 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-9584.001.patch, HDFS-9584.patch, HDFS-9584.patch > > > {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml > hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat} > if {{ssl-distcp.xml}} file not exist in class path, distcp will throw > NullPointerException. > {code} > java.lang.NullPointerException > at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266) > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:127) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:431) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9244) Support nested encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093013#comment-15093013 ] Xiaoyu Yao commented on HDFS-9244: -- Thanks [~zhz] for working on this. Can we clarify the use cases (in addition to the original one mentioned in the description) before unblocking this? And how often are they being used/requested by the customer deployments. My concern is that this could bring up tricky cases such as upgrade/rollback, trash, etc. to document, support and maintain for nested zones. We don't want to introduce unnecessary complexity unless there are important use cases behind it. Thanks! > Support nested encryption zones > --- > > Key: HDFS-9244 > URL: https://issues.apache.org/jira/browse/HDFS-9244 > Project: Hadoop HDFS > Issue Type: New Feature > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Zhe Zhang > Attachments: HDFS-9244.00.patch, HDFS-9244.01.patch > > > This JIRA is opened to track adding support of nested encryption zone based > on [~andrew.wang]'s [comment > |https://issues.apache.org/jira/browse/HDFS-8747?focusedCommentId=14654141=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14654141] > for certain use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9584) NPE in distcp when ssl configuration file does not exist in class path.
[ https://issues.apache.org/jira/browse/HDFS-9584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15093202#comment-15093202 ] Xiaoyu Yao commented on HDFS-9584: -- Thanks [~jojochuang]! I've corrected the commit message. > NPE in distcp when ssl configuration file does not exist in class path. > --- > > Key: HDFS-9584 > URL: https://issues.apache.org/jira/browse/HDFS-9584 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.7.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-9584.001.patch, HDFS-9584.patch, HDFS-9584.patch > > > {noformat}./hadoop distcp -mapredSslConf ssl-distcp.xml > hftp://x.x.x.x:25003/history hdfs://x.x.x.X:25008/history{noformat} > if {{ssl-distcp.xml}} file not exist in class path, distcp will throw > NullPointerException. > {code} > java.lang.NullPointerException > at org.apache.hadoop.tools.DistCp.setupSSLConfig(DistCp.java:266) > at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:250) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:175) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:127) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.tools.DistCp.main(DistCp.java:431) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8548) Minicluster throws NPE on shutdown
[ https://issues.apache.org/jira/browse/HDFS-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314688#comment-15314688 ] Xiaoyu Yao commented on HDFS-8548: -- Sounds good to me. I just cherry-picked it to branch-2.7. > Minicluster throws NPE on shutdown > -- > > Key: HDFS-8548 > URL: https://issues.apache.org/jira/browse/HDFS-8548 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mike Drob >Assignee: Surendra Singh Lilhore > Labels: reviewed > Fix For: 2.8.0 > > Attachments: HDFS-8548.patch > > > FtAfter running Solr tests, when we attempt to shut down the mini cluster > that we use for our unit tests, we get an NPE in the clean up thread. The > test still completes normally, but this generates a lot of extra noise. > {noformat} >[junit4] 2> java.lang.reflect.InvocationTargetException >[junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) >[junit4] 2> at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >[junit4] 2> at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >[junit4] 2> at java.lang.reflect.Method.invoke(Method.java:497) >[junit4] 2> at > org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) >[junit4] 2> at > org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) >[junit4] 2> at > org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:387) >[junit4] 2> at > org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:195) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) >[junit4] 2> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804) >[junit4] 2> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595) >[junit4] 2> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813) >[junit4] 2> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430) >[junit4] 2> at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) >[junit4] 2> at > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) >[junit4] 2> at > org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:227) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:212) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:461) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:212) >[junit4] 2> at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:592) >[junit4] 2> at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72) >[junit4] 2> at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68) >[junit4] 2> at > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:145) >[junit4] 2> at > org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:822) >[junit4] 2> at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1720) >[junit4] 2> at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1699) >[junit4] 2> at > org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:197) >[junit4] 2> at > org.apache.solr.core.HdfsDirectoryFactoryTest.teardownClass(HdfsDirectoryFactoryTest.java:67) >[junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) >[junit4] 2> at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >[junit4] 2> at >
[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
[ https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325503#comment-15325503 ] Xiaoyu Yao commented on HDFS-10512: --- Thanks [~jojochuang] for reporting the issue and [~linyiqun] for posting the patch. There is a similar usage {{DataNode#reportBadBlock}} that needs to check null volume as well. For both cases, I would suggest we LOG an ERROR like follows. {code} if (volume != null) { bpos.reportBadBlocks( block, volume.getStorageID(), volume.getStorageType()); } else { LOG.error("Cannot find FsVolumeSpi to report bad block id:" + blockBlockId() + " bpid: " + block.getBlockPoolId()); } {code} > VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks > -- > > Key: HDFS-10512 > URL: https://issues.apache.org/jira/browse/HDFS-10512 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Yiqun Lin > Attachments: HDFS-10512.001.patch > > > VolumeScanner may terminate due to unexpected NullPointerException thrown in > {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190 > I observed this bug in a production CDH 5.5.1 cluster and the same bug still > persist in upstream trunk. > {noformat} > 2016-04-07 20:30:53,830 WARN > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad > BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn > 2016-04-07 20:30:53,831 ERROR > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, > DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621) > 2016-04-07 20:30:53,832 INFO > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, > DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting. > {noformat} > I think the NPE comes from the volume variable in the following code snippet. > Somehow the volume scanner know the volume, but the datanode can not lookup > the volume using the block. > {code} > public void reportBadBlocks(ExtendedBlock block) throws IOException{ > BPOfferService bpos = getBPOSForBlock(block); > FsVolumeSpi volume = getFSDataset().getVolume(block); > bpos.reportBadBlocks( > block, volume.getStorageID(), volume.getStorageType()); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-9650) Problem is logging of "Redundant addStoredBlock request received"
[ https://issues.apache.org/jira/browse/HDFS-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-9650: Assignee: Xiaoyu Yao > Problem is logging of "Redundant addStoredBlock request received" > - > > Key: HDFS-9650 > URL: https://issues.apache.org/jira/browse/HDFS-9650 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Frode Halvorsen >Assignee: Xiaoyu Yao > > Description; > Hadoop 2.7.1. 2 namenodes in HA. 14 Datanodes. > Enough CPU,disk and RAM. > Just discovered that some datanodes must have been corrupted somehow. > When restarting a 'defect' ( works without failure except when restarting) > the active namenode suddenly is logging a lot of : "Redundant addStoredBlock > request received" > and finally the failover-controller takes the namenode down, fails over to > other node. This node also starts logging the same, and as soon as the fisrt > node is bac online, the failover-controller again kill the active node, and > does failover. > This node now was started after the datanode, and doesn't log "Redundant > addStoredBlock request received" anymore, and restart of the second name-node > works fine. > If I again restarts the datanode- the process repeats itself. > Problem is logging of "Redundant addStoredBlock request received" and why > does it happen ? > The failover-controller acts the same way as it did on 2.5/6 when we had a > lot of 'block does not belong to any replica'-messages. Namenode is too busy > to respond to heartbeats, and is taken down... > To resolve this, I have to take down the datanode, delete all data from it, > and start it up. Then cluster will reproduce the missing blocks, and the > failing datanode is working fine again... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9650) Problem is logging of "Redundant addStoredBlock request received"
[ https://issues.apache.org/jira/browse/HDFS-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331182#comment-15331182 ] Xiaoyu Yao commented on HDFS-9650: -- Thanks [~brahma] for the heads up. Yes, we do need to backport HDFS-9906 to branch-2.7. In our case, adding a dedicated serviceRPC port help avoiding the NN failover. > Problem is logging of "Redundant addStoredBlock request received" > - > > Key: HDFS-9650 > URL: https://issues.apache.org/jira/browse/HDFS-9650 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Frode Halvorsen >Assignee: Xiaoyu Yao > > Description; > Hadoop 2.7.1. 2 namenodes in HA. 14 Datanodes. > Enough CPU,disk and RAM. > Just discovered that some datanodes must have been corrupted somehow. > When restarting a 'defect' ( works without failure except when restarting) > the active namenode suddenly is logging a lot of : "Redundant addStoredBlock > request received" > and finally the failover-controller takes the namenode down, fails over to > other node. This node also starts logging the same, and as soon as the fisrt > node is bac online, the failover-controller again kill the active node, and > does failover. > This node now was started after the datanode, and doesn't log "Redundant > addStoredBlock request received" anymore, and restart of the second name-node > works fine. > If I again restarts the datanode- the process repeats itself. > Problem is logging of "Redundant addStoredBlock request received" and why > does it happen ? > The failover-controller acts the same way as it did on 2.5/6 when we had a > lot of 'block does not belong to any replica'-messages. Namenode is too busy > to respond to heartbeats, and is taken down... > To resolve this, I have to take down the datanode, delete all data from it, > and start it up. Then cluster will reproduce the missing blocks, and the > failing datanode is working fine again... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted
[ https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331228#comment-15331228 ] Xiaoyu Yao commented on HDFS-9906: -- Cherrypicked to branch-2.7. > Remove spammy log spew when a datanode is restarted > --- > > Key: HDFS-9906 > URL: https://issues.apache.org/jira/browse/HDFS-9906 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9906.patch > > > {code} > WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock > request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size > 268435456 > {code} > This happens wy too much to add any useful information. We should either > move this to a different level or only warn once per machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing
[ https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10528: -- Status: Patch Available (was: Open) > Add logging to successful standby checkpointing > --- > > Key: HDFS-10528 > URL: https://issues.apache.org/jira/browse/HDFS-10528 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10528.00.patch > > > This ticket is opened to add INFO log for a successful standby checkpointing > in the code below for troubleshooting. > {code} > if (needCheckpoint) { > doCheckpoint(); > // reset needRollbackCheckpoint to false only when we finish a > ckpt > // for rollback image > if (needRollbackCheckpoint > && namesystem.getFSImage().hasRollbackFSImage()) { > namesystem.setCreatedRollbackImages(true); > namesystem.setNeedRollbackFsImage(false); > } > lastCheckpointTime = now; > } > } catch (SaveNamespaceCancelledException ce) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing
[ https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10528: -- Attachment: HDFS-10528.00.patch > Add logging to successful standby checkpointing > --- > > Key: HDFS-10528 > URL: https://issues.apache.org/jira/browse/HDFS-10528 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10528.00.patch > > > This ticket is opened to add INFO log for a successful standby checkpointing > in the code below for troubleshooting. > {code} > if (needCheckpoint) { > doCheckpoint(); > // reset needRollbackCheckpoint to false only when we finish a > ckpt > // for rollback image > if (needRollbackCheckpoint > && namesystem.getFSImage().hasRollbackFSImage()) { > namesystem.setCreatedRollbackImages(true); > namesystem.setNeedRollbackFsImage(false); > } > lastCheckpointTime = now; > } > } catch (SaveNamespaceCancelledException ce) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9906) Remove spammy log spew when a datanode is restarted
[ https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332102#comment-15332102 ] Xiaoyu Yao commented on HDFS-9906: -- Thanks [~ajisakaa], cherry-pick to 2.7.3 and update fix version to 2.7.3. cc: [~vinodkv]. > Remove spammy log spew when a datanode is restarted > --- > > Key: HDFS-9906 > URL: https://issues.apache.org/jira/browse/HDFS-9906 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Brahma Reddy Battula > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9906.patch > > > {code} > WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock > request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size > 268435456 > {code} > This happens wy too much to add any useful information. We should either > move this to a different level or only warn once per machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9906) Remove spammy log spew when a datanode is restarted
[ https://issues.apache.org/jira/browse/HDFS-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9906: - Fix Version/s: (was: 2.7.4) 2.7.3 > Remove spammy log spew when a datanode is restarted > --- > > Key: HDFS-9906 > URL: https://issues.apache.org/jira/browse/HDFS-9906 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.2 >Reporter: Elliott Clark >Assignee: Brahma Reddy Battula > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9906.patch > > > {code} > WARN BlockStateChange: BLOCK* addStoredBlock: Redundant addStoredBlock > request received for blk_1109897077_36157149 on node 192.168.1.1:50010 size > 268435456 > {code} > This happens wy too much to add any useful information. We should either > move this to a different level or only warn once per machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310837#comment-15310837 ] Xiaoyu Yao commented on HDFS-9924: -- [~daryn], thanks for the valuable feedback. @Kihwal Lee also mentioned similar issue [here|https://issues.apache.org/jira/browse/HADOOP-12916?focusedCommentId=15277342=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15277342]. But I wasn't able to get clarification of it. The FSN/FSD locking issue is a very good point. I tried to find some metrics/logs about it but there was not any. I will open a separate ticket to add more metrics and WARN/DEBUG logs for long locking operations on namenode similar to what we have for slow write/network WARN/metrics on datanode. As you mentioned above, the priority level is assigned by scheduler. As part of HADOOP-12916, we separate scheduler from call queue and make it pluggable so that priority assignment can be customized as appropriate for different workloads. For the mixed write intensive and read workload example, I agree that the DecayedRpcScheduler that uses call rate to determine priority may not be the good choice. We have thought of adding a different scheduler that combines the weight of RPC call and its rate. But it is tricky to assign weight. For example, getContentSummary on a directory with millions of files/dirs and a directory with a few files/dirs won't have the same impact on NN. Backoff based on response time allows all users to stop overloading namenode when the high priority RPC calls experience longer than normal end to end delay. User2/User3/User4 (low priority based on call rate) will have much wider response time threshold for backing off. In this case, User 1 will be backed off first by breaking the relative smaller response time threshold and get namenode out of the state that other users can not use the namenode "fairly". We are also proposing to have a scheduler that offers better namenode resource management via YARN integration on HADOOP-13128. I would appreciate if you can share your thoughts and comments on the proposal there as well. Thanks! > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310837#comment-15310837 ] Xiaoyu Yao edited comment on HDFS-9924 at 6/1/16 6:31 PM: -- [~daryn], thanks for the valuable feedback. [~kihwal] also mentioned similar issue [here|https://issues.apache.org/jira/browse/HADOOP-12916?focusedCommentId=15277342=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15277342]. But I wasn't able to get clarification of it. The FSN/FSD locking issue is a very good point. I tried to find some metrics/logs about it but there was not any. I will open a separate ticket to add more metrics and WARN/DEBUG logs for long locking operations on namenode similar to what we have for slow write/network WARN/metrics on datanode. As you mentioned above, the priority level is assigned by scheduler. As part of HADOOP-12916, we separate scheduler from call queue and make it pluggable so that priority assignment can be customized as appropriate for different workloads. For the mixed write intensive and read workload example, I agree that the DecayedRpcScheduler that uses call rate to determine priority may not be the good choice. We have thought of adding a different scheduler that combines the weight of RPC call and its rate. But it is tricky to assign weight. For example, getContentSummary on a directory with millions of files/dirs and a directory with a few files/dirs won't have the same impact on NN. Backoff based on response time allows all users to stop overloading namenode when the high priority RPC calls experience longer than normal end to end delay. User2/User3/User4 (low priority based on call rate) will have much wider response time threshold for backing off. In this case, User 1 will be backed off first by breaking the relative smaller response time threshold and get namenode out of the state that other users can not use the namenode "fairly". We are also proposing to have a scheduler that offers better namenode resource management via YARN integration on HADOOP-13128. I would appreciate if you can share your thoughts and comments on the proposal there as well. Thanks! was (Author: xyao): [~daryn], thanks for the valuable feedback. @Kihwal Lee also mentioned similar issue [here|https://issues.apache.org/jira/browse/HADOOP-12916?focusedCommentId=15277342=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15277342]. But I wasn't able to get clarification of it. The FSN/FSD locking issue is a very good point. I tried to find some metrics/logs about it but there was not any. I will open a separate ticket to add more metrics and WARN/DEBUG logs for long locking operations on namenode similar to what we have for slow write/network WARN/metrics on datanode. As you mentioned above, the priority level is assigned by scheduler. As part of HADOOP-12916, we separate scheduler from call queue and make it pluggable so that priority assignment can be customized as appropriate for different workloads. For the mixed write intensive and read workload example, I agree that the DecayedRpcScheduler that uses call rate to determine priority may not be the good choice. We have thought of adding a different scheduler that combines the weight of RPC call and its rate. But it is tricky to assign weight. For example, getContentSummary on a directory with millions of files/dirs and a directory with a few files/dirs won't have the same impact on NN. Backoff based on response time allows all users to stop overloading namenode when the high priority RPC calls experience longer than normal end to end delay. User2/User3/User4 (low priority based on call rate) will have much wider response time threshold for backing off. In this case, User 1 will be backed off first by breaking the relative smaller response time threshold and get namenode out of the state that other users can not use the namenode "fairly". We are also proposing to have a scheduler that offers better namenode resource management via YARN integration on HADOOP-13128. I would appreciate if you can share your thoughts and comments on the proposal there as well. Thanks! > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to
[jira] [Created] (HDFS-10475) Adding metrics and warn/debug logs for long FSD lock
Xiaoyu Yao created HDFS-10475: - Summary: Adding metrics and warn/debug logs for long FSD lock Key: HDFS-10475 URL: https://issues.apache.org/jira/browse/HDFS-10475 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao This is a follow up of the comment on HADOOP-12916 and [here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837] add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on namenode similar to what we have for slow write/network WARN/metrics on datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10528) Add logging to successful standby checkpointing
Xiaoyu Yao created HDFS-10528: - Summary: Add logging to successful standby checkpointing Key: HDFS-10528 URL: https://issues.apache.org/jira/browse/HDFS-10528 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao This ticket is opened to add INFO log for a successful standby checkpointing in the code below for troubleshooting. {code} if (needCheckpoint) { doCheckpoint(); // reset needRollbackCheckpoint to false only when we finish a ckpt // for rollback image if (needRollbackCheckpoint && namesystem.getFSImage().hasRollbackFSImage()) { namesystem.setCreatedRollbackImages(true); namesystem.setNeedRollbackFsImage(false); } lastCheckpointTime = now; } } catch (SaveNamespaceCancelledException ce) { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10528) Add logging to successful standby checkpointing
[ https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330846#comment-15330846 ] Xiaoyu Yao commented on HDFS-10528: --- Plan to add a log entry after {{ lastCheckpointTime = now;}}. > Add logging to successful standby checkpointing > --- > > Key: HDFS-10528 > URL: https://issues.apache.org/jira/browse/HDFS-10528 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > This ticket is opened to add INFO log for a successful standby checkpointing > in the code below for troubleshooting. > {code} > if (needCheckpoint) { > doCheckpoint(); > // reset needRollbackCheckpoint to false only when we finish a > ckpt > // for rollback image > if (needRollbackCheckpoint > && namesystem.getFSImage().hasRollbackFSImage()) { > namesystem.setCreatedRollbackImages(true); > namesystem.setNeedRollbackFsImage(false); > } > lastCheckpointTime = now; > } > } catch (SaveNamespaceCancelledException ce) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing
[ https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10528: -- Status: Open (was: Patch Available) > Add logging to successful standby checkpointing > --- > > Key: HDFS-10528 > URL: https://issues.apache.org/jira/browse/HDFS-10528 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10528.00.patch > > > This ticket is opened to add INFO log for a successful standby checkpointing > in the code below for troubleshooting. > {code} > if (needCheckpoint) { > doCheckpoint(); > // reset needRollbackCheckpoint to false only when we finish a > ckpt > // for rollback image > if (needRollbackCheckpoint > && namesystem.getFSImage().hasRollbackFSImage()) { > namesystem.setCreatedRollbackImages(true); > namesystem.setNeedRollbackFsImage(false); > } > lastCheckpointTime = now; > } > } catch (SaveNamespaceCancelledException ce) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10528) Add logging to successful standby checkpointing
[ https://issues.apache.org/jira/browse/HDFS-10528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10528: -- Status: Patch Available (was: Open) > Add logging to successful standby checkpointing > --- > > Key: HDFS-10528 > URL: https://issues.apache.org/jira/browse/HDFS-10528 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10528.00.patch > > > This ticket is opened to add INFO log for a successful standby checkpointing > in the code below for troubleshooting. > {code} > if (needCheckpoint) { > doCheckpoint(); > // reset needRollbackCheckpoint to false only when we finish a > ckpt > // for rollback image > if (needRollbackCheckpoint > && namesystem.getFSImage().hasRollbackFSImage()) { > namesystem.setCreatedRollbackImages(true); > namesystem.setNeedRollbackFsImage(false); > } > lastCheckpointTime = now; > } > } catch (SaveNamespaceCancelledException ce) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10539) DecayRpcScheduler MXBean should only report decayed CallVolumeSummary
Xiaoyu Yao created HDFS-10539: - Summary: DecayRpcScheduler MXBean should only report decayed CallVolumeSummary Key: HDFS-10539 URL: https://issues.apache.org/jira/browse/HDFS-10539 Project: Hadoop HDFS Issue Type: Bug Components: ipc Reporter: Namit Maheshwari Assignee: Xiaoyu Yao HADOOP-13197 added non-decayed call metrics in metrics2 source for DecayedRpcScheduler. However, CallVolumeSummary in MXBean was affected unexpectedly to include both decayed and non-decayed call volume. The root cause is Jackson ObjectMapper simply serialize all the content of the callCounts map which contains both non-decayed and decayed counter after HADOOP-13197. This ticket is opened to fix the CallVolumeSummary in MXBean to include only decayed call volume for backward compatibility and add unit test for DecayRpcScheduler MXBean to catch this in future. CallVolumeSummary JMX example before HADOOP-13197 {code} "CallVolumeSummary" : "{\"hbase\":1,\"mapred\":1}" {code} CallVolumeSummary JMX example after HADOOP-13197 {code} "CallVolumeSummary" : "{\"hrt_qa\":[1,2]}" {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10469) Add number of active xceivers to datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342988#comment-15342988 ] Xiaoyu Yao edited comment on HDFS-10469 at 6/21/16 11:02 PM: - Thanks [~hanishakoneru] for updating the patch. The V3 patch looks to me and unit test failures don't seem relate to this patch. +1 and I will rerun failed tests and commit it if everything pass locally. was (Author: xyao): Thanks [~hanishakoneru] for updating the patch. The V4 patch looks to me and unit test failures don't seem relate to this patch. +1 and I will rerun failed tests and commit it if everything pass locally. > Add number of active xceivers to datanode metrics > - > > Key: HDFS-10469 > URL: https://issues.apache.org/jira/browse/HDFS-10469 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, > HDFS-10469.002.patch, HDFS-10469.003.patch > > > Number of active xceivers is exposed via jmx, but not in Datanode metrics. We > should add it to datanode metrics for monitoring the load on Datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10469) Add number of active xceivers to datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342988#comment-15342988 ] Xiaoyu Yao commented on HDFS-10469: --- Thanks [~hanishakoneru] for updating the patch. The V4 patch looks to me and unit test failures don't seem relate to this patch. +1 and I will rerun failed tests and commit it if everything pass locally. > Add number of active xceivers to datanode metrics > - > > Key: HDFS-10469 > URL: https://issues.apache.org/jira/browse/HDFS-10469 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, > HDFS-10469.002.patch, HDFS-10469.003.patch > > > Number of active xceivers is exposed via jmx, but not in Datanode metrics. We > should add it to datanode metrics for monitoring the load on Datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10535) Rename AsyncDistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334156#comment-15334156 ] Xiaoyu Yao commented on HDFS-10535: --- Thanks [~szetszwo] for working on this. The patch looks good to me and just two unit test issues below. TestAsyncDFS.java {code} return cluster.getFileSystem().getAsyncDistributedFileSystem(); ==> return cluster.getFileSystem().getNonblockingCalls(); {code} TestAsyncHDFSWithHA.java {code} dfs.getAsyncDistributedFileSystem() ==> dfs.getNonblockingCalls() {code} > Rename AsyncDistributedFileSystem > - > > Key: HDFS-10535 > URL: https://issues.apache.org/jira/browse/HDFS-10535 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h10535_20160616.patch > > > Per discussion in HDFS-9924, AsyncDistributedFileSystem is not a good name > since we only support nonblocking calls for the moment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10469) Add number of active xceivers to datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347197#comment-15347197 ] Xiaoyu Yao commented on HDFS-10469: --- I finished testing this patch against the failed tests. Only TestOfflineEditsViewer#testGenerated can be consistently repro no matter the patch for HDFS-10469 is applied or not. I opened HDFS-10572 to track the fix for TestOfflineEditsViewer#testGenerated and will commit HDFS-10469 shortly. > Add number of active xceivers to datanode metrics > - > > Key: HDFS-10469 > URL: https://issues.apache.org/jira/browse/HDFS-10469 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, > HDFS-10469.002.patch, HDFS-10469.003.patch > > > Number of active xceivers is exposed via jmx, but not in Datanode metrics. We > should add it to datanode metrics for monitoring the load on Datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10469) Add number of active xceivers to datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10469: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~hanishakoneru] for the contribution. I've committed the patch to trunk. > Add number of active xceivers to datanode metrics > - > > Key: HDFS-10469 > URL: https://issues.apache.org/jira/browse/HDFS-10469 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Labels: datanode, metrics > Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, > HDFS-10469.002.patch, HDFS-10469.003.patch > > > Number of active xceivers is exposed via jmx, but not in Datanode metrics. We > should add it to datanode metrics for monitoring the load on Datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10469) Add number of active xceivers to datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10469: -- Labels: datanode metrics (was: ) > Add number of active xceivers to datanode metrics > - > > Key: HDFS-10469 > URL: https://issues.apache.org/jira/browse/HDFS-10469 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru > Labels: datanode, metrics > Attachments: HDFS-10469.000.patch, HDFS-10469.001.patch, > HDFS-10469.002.patch, HDFS-10469.003.patch > > > Number of active xceivers is exposed via jmx, but not in Datanode metrics. We > should add it to datanode metrics for monitoring the load on Datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10572) Fix TestOfflineEditsViewer#testGenerated
Xiaoyu Yao created HDFS-10572: - Summary: Fix TestOfflineEditsViewer#testGenerated Key: HDFS-10572 URL: https://issues.apache.org/jira/browse/HDFS-10572 Project: Hadoop HDFS Issue Type: Bug Components: newbie, test Reporter: Xiaoyu Yao The test has been failing consistently on trunk recently. This ticket is open to fix this test to avoid false alarm on Jenkins. Figure out which recent commit caused this failure can be a good start. {code} --- T E S T S --- Running org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.646 sec <<< FAILURE! - in org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer testGenerated(org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer) Time elapsed: 3.623 sec <<< FAILURE! java.lang.AssertionError: Generated edits and reparsed (bin to XML to bin) should be same at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer.testGenerated(TestOfflineEditsViewer.java:125) Results : Failed tests: TestOfflineEditsViewer.testGenerated:125 Generated edits and reparsed (bin to XML to bin) should be same Tests run: 5, Failures: 1, Errors: 0, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9759) Fix the typo in JvmPauseMonitor#getNumGcWarnThreadholdExceeded
[ https://issues.apache.org/jira/browse/HDFS-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134633#comment-15134633 ] Xiaoyu Yao commented on HDFS-9759: -- +1, I will commit it shortly. > Fix the typo in JvmPauseMonitor#getNumGcWarnThreadholdExceeded > -- > > Key: HDFS-9759 > URL: https://issues.apache.org/jira/browse/HDFS-9759 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou >Priority: Minor > Attachments: HDFS-9759.000.patch > > > There is typo in JvmPauseMonitor#getNumGcWarnThreadholdExceeded, which should > be Threshold. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8660) Slow write to packet mirror should log which mirror and which block
[ https://issues.apache.org/jira/browse/HDFS-8660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133369#comment-15133369 ] Xiaoyu Yao commented on HDFS-8660: -- [~hazem], thanks for working on this. Can you rebase the patch to trunk? > Slow write to packet mirror should log which mirror and which block > --- > > Key: HDFS-8660 > URL: https://issues.apache.org/jira/browse/HDFS-8660 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Hazem Mahmoud >Assignee: Hazem Mahmoud > Attachments: HDFS-8660.001.patch > > > Currently, log format states something similar to: > "Slow BlockReceiver write packet to mirror took 468ms (threshold=300ms)" > For troubleshooting purposes, it would be good to have it mention which block > ID it's writing as well as the mirror (DN) that it's writing it to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9688) Test the effect of nested encryption zones in HDFS downgrade
[ https://issues.apache.org/jira/browse/HDFS-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123865#comment-15123865 ] Xiaoyu Yao commented on HDFS-9688: -- bq. Also, renaming the root dir of a nested EZ won't be allowed, because the destination will be in an EZ (the parent EZ). I think this is the right behavior for nested EZ, but please see if you agree. [~zhz], Do you plan to block rename EZ root on 2.7 and forward? I don't think we should block rename EZ root, which is an incompatible change from 2.7 that can break Trash support for HDFS encryption. If I understand the nested EZ correctly, the renamed nested EZ will still be encrypted with its own zone key no matter the destination is encrypted with different keys or not. > Test the effect of nested encryption zones in HDFS downgrade > > > Key: HDFS-9688 > URL: https://issues.apache.org/jira/browse/HDFS-9688 > Project: Hadoop HDFS > Issue Type: Test > Components: encryption, test >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9688-branch-2.6.00.patch, > HDFS-9688-branch-2.6.01.patch, HDFS-9688-branch-2.8.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9210) Fix some misuse of %n in VolumeScanner#printStats
[ https://issues.apache.org/jira/browse/HDFS-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9210: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks all for the reviews. I've commit the patch to trunk, branch-2 and branch-2.8. > Fix some misuse of %n in VolumeScanner#printStats > - > > Key: HDFS-9210 > URL: https://issues.apache.org/jira/browse/HDFS-9210 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Fix For: 2.8.0 > > Attachments: HDFS-9210.00.patch, HDFS-9210.01.patch, > HDFS-9210.02.patch > > > Found 2 extra "%n" in the VolumeScanner report and lines not well formatted > below. This JIRA is opened to fix the format issue. > {code} > Block scanner information for volume DS-93fb2503-de00-4f98-a8bc-c2bc13b8f0f7 > with base path /hadoop/hdfs/data%nBytes verified in last hour : > 136882014 > Blocks scanned in current period : > 5 > Blocks scanned since restart : > 5 > Block pool scans since restart: > 0 > Block scan errors since restart : > 0 > Hours until next block pool scan : > 476.000 > Last block scanned: > BP-1792969149-192.168.70.101-1444150984999:blk_1073742088_1274 > More blocks to scan in period : > false > %n > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9750) Document -source option for balancer
[ https://issues.apache.org/jira/browse/HDFS-9750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao resolved HDFS-9750. -- Resolution: Duplicate > Document -source option for balancer > > > Key: HDFS-9750 > URL: https://issues.apache.org/jira/browse/HDFS-9750 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > HDFS-8826 introduced -source option for balancer. This needs to be documented > in HDFSCommands.md for administrators to use it appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8923) Add -source flag to balancer usage message
[ https://issues.apache.org/jira/browse/HDFS-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130969#comment-15130969 ] Xiaoyu Yao commented on HDFS-8923: -- [~ctrezzo], the patch v2 looks good to me. Can you rebase the patch to trunk? > Add -source flag to balancer usage message > -- > > Key: HDFS-8923 > URL: https://issues.apache.org/jira/browse/HDFS-8923 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Attachments: HDFS-8923-trunk-v1.patch, HDFS-8923-trunk-v2.patch > > > HDFS-8826 added a -source flag to the balancer, but the usage message still > needs to be updated. See current usage message in trunk: > {code} >private static final String USAGE = "Usage: hdfs balancer" >+ "\n\t[-policy ]\tthe balancing policy: " >+ BalancingPolicy.Node.INSTANCE.getName() + " or " >+ BalancingPolicy.Pool.INSTANCE.getName() >+ "\n\t[-threshold ]\tPercentage of disk capacity" >+ "\n\t[-exclude [-f | ]]" >+ "\tExcludes the specified datanodes." >+ "\n\t[-include [-f | ]]" >+ "\tIncludes only the specified datanodes." >+ "\n\t[-idleiterations ]" >+ "\tNumber of consecutive idle iterations (-1 for Infinite) before " >+ "exit." >+ "\n\t[-runDuringUpgrade]" >+ "\tWhether to run the balancer during an ongoing HDFS upgrade." >+ "This is usually not desired since it will not affect used space " >+ "on over-utilized machines."; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9750) Document -source option for balancer
Xiaoyu Yao created HDFS-9750: Summary: Document -source option for balancer Key: HDFS-9750 URL: https://issues.apache.org/jira/browse/HDFS-9750 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Affects Versions: 2.8.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao HDFS-8826 introduced -source option for balancer. This needs to be documented in HDFSCommands.md for administrators to use it appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9723) Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext
Xiaoyu Yao created HDFS-9723: Summary: Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext Key: HDFS-9723 URL: https://issues.apache.org/jira/browse/HDFS-9723 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao HDFS namenode handles RPC requests from DFS clients and internal processing from datanodes. It has been a recurring pain that some bad jobs overwhelm the namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 is the one of the existing efforts added since Hadoop 2.4 to address this issue. In current FCQ implementation, incoming RPC calls are scheduled based on the number of recent RPC calls (1000) of different users with a time-decayed scheduler. This works well when there is a clear mapping between users and their RPC calls from different jobs. However, this may not work effectively when it is hard to track calls to a specific caller in a chain of operations from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for operators/administrators to throttle all the hive jobs because of one “bad” query. This JIRA proposed to leverage RPC caller context information (such as callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative to existing UGI (or user name when delegation token is not available) based Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue (HADOOP-9640) for better namenode throttling in multi-tenancy cluster deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9843: - Attachment: HDFS-9843.02.patch Thanks [~cnauroth]! Update the patch with the fixed the link. > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-9843.00.patch, HDFS-9843.01.patch, > HDFS-9843.02.patch > > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9843: - Attachment: HDFS-9843.01.patch Thanks [~cnauroth] for the review. Patch 01 fix the anchor to distcp command line options. > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-9843.00.patch, HDFS-9843.01.patch > > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9843: - Affects Version/s: 2.6.0 > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9843: - Component/s: encryption documentation distcp > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9843) Document distcp options required for copying between encrypted locations
Xiaoyu Yao created HDFS-9843: Summary: Document distcp options required for copying between encrypted locations Key: HDFS-9843 URL: https://issues.apache.org/jira/browse/HDFS-9843 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao In TransparentEncryption.md#Distcp_considerations document section, we have "Copying_between_encrypted_and_unencrypted_locations" which requires -skipcrccheck and -update. These options should be documented as required for "Copying between encrypted locations" use cases as well because this involves decrypting source file and encrypting destination file with a different EDEK, resulting in different checksum at the destination. Distcp will fail at crc check if -skipcrccheck if not specified. This ticket is opened to document the required options for "Copying between encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122
[ https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9831: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Tags: webhdfs Status: Resolved (was: Patch Available) > Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 > > > Key: HDFS-9831 > URL: https://issues.apache.org/jira/browse/HDFS-9831 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.8.0 > > Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch, > HDFS-9831.002.patch, HDFS-9831.003.patch > > > This ticket is opened to document the configuration keys introduced by > HDFS-5219/HDFS-5122 for WebHdfs Retry. Both hdfs-default.xml and webhdfs.md > should be updated with the usage of these keys. > {code} > / WebHDFS retry policy > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = > "dfs.http.client.retry.policy.enabled"; > public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = > false; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = > "dfs.http.client.retry.policy.spec"; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = > "1,6,6,10"; //t1,n1,t2,n2,... > public static final String DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = > "dfs.http.client.failover.max.attempts"; > public static final int DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = > 15; > public static final String DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = > "dfs.http.client.retry.max.attempts"; > public static final int DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = > "dfs.http.client.failover.sleep.base.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT > = 500; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = > "dfs.http.client.failover.sleep.max.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = > 15000; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122
[ https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9831: - Thanks [~xiaobingo] for the contribution. I've committed the patch to trunk, branch-2, and branch-2.8. > Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 > > > Key: HDFS-9831 > URL: https://issues.apache.org/jira/browse/HDFS-9831 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.8.0 > > Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch, > HDFS-9831.002.patch, HDFS-9831.003.patch > > > This ticket is opened to document the configuration keys introduced by > HDFS-5219/HDFS-5122 for WebHdfs Retry. Both hdfs-default.xml and webhdfs.md > should be updated with the usage of these keys. > {code} > / WebHDFS retry policy > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = > "dfs.http.client.retry.policy.enabled"; > public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = > false; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = > "dfs.http.client.retry.policy.spec"; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = > "1,6,6,10"; //t1,n1,t2,n2,... > public static final String DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = > "dfs.http.client.failover.max.attempts"; > public static final int DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = > 15; > public static final String DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = > "dfs.http.client.retry.max.attempts"; > public static final int DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = > "dfs.http.client.failover.sleep.base.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT > = 500; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = > "dfs.http.client.failover.sleep.max.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = > 15000; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122
[ https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167566#comment-15167566 ] Xiaoyu Yao commented on HDFS-9831: -- Thanks [~xiaobingo] for the update. One more issue found on the rendered webhdfs.html (changes in webhdfs.md) I don't think we should put "The following properties control WebHDFS retry and failover policy." and the retry keys under the "Cross-Site Request Forgery Prevention" section. Can you add this as a separate section like below? {code} WebHDFS Retry Policy - WebHDFS supports an optional, configurable retry policy for resilient copy of large files that could timeout, or copy file between HA clusters that could failover during the copy. The following properties control WebHDFS retry and failover policy. ... {code} > Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 > > > Key: HDFS-9831 > URL: https://issues.apache.org/jira/browse/HDFS-9831 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch > > > This ticket is opened to document the configuration keys introduced by > HDFS-5219/HDFS-5122 for WebHdfs Retry. Both hdfs-default.xml and webhdfs.md > should be updated with the usage of these keys. > {code} > / WebHDFS retry policy > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = > "dfs.http.client.retry.policy.enabled"; > public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = > false; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = > "dfs.http.client.retry.policy.spec"; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = > "1,6,6,10"; //t1,n1,t2,n2,... > public static final String DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = > "dfs.http.client.failover.max.attempts"; > public static final int DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = > 15; > public static final String DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = > "dfs.http.client.retry.max.attempts"; > public static final int DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = > "dfs.http.client.failover.sleep.base.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT > = 500; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = > "dfs.http.client.failover.sleep.max.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = > 15000; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9843: - Attachment: HDFS-9843.00.patch Attach a initial patch. > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-9843.00.patch > > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9843: - Status: Patch Available (was: Open) > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-9843.00.patch > > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122
[ https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168107#comment-15168107 ] Xiaoyu Yao commented on HDFS-9831: -- [~xiaobingo], thanks for the update. We need to add an anchor for the new section in the top level directory. +1 after that being added. {code} ... * [Cross-Site Request Forgery Prevention](#Cross-Site_Request_Forgery_Prevention) * [WebHDFS Retry Policy](#WebHDFS_Retry_Policy) {code} > Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 > > > Key: HDFS-9831 > URL: https://issues.apache.org/jira/browse/HDFS-9831 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-9831.000.patch, HDFS-9831.001.patch, > HDFS-9831.002.patch > > > This ticket is opened to document the configuration keys introduced by > HDFS-5219/HDFS-5122 for WebHdfs Retry. Both hdfs-default.xml and webhdfs.md > should be updated with the usage of these keys. > {code} > / WebHDFS retry policy > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = > "dfs.http.client.retry.policy.enabled"; > public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = > false; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = > "dfs.http.client.retry.policy.spec"; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = > "1,6,6,10"; //t1,n1,t2,n2,... > public static final String DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = > "dfs.http.client.failover.max.attempts"; > public static final int DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = > 15; > public static final String DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = > "dfs.http.client.retry.max.attempts"; > public static final int DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = > "dfs.http.client.failover.sleep.base.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT > = 500; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = > "dfs.http.client.failover.sleep.max.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = > 15000; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122
[ https://issues.apache.org/jira/browse/HDFS-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166336#comment-15166336 ] Xiaoyu Yao commented on HDFS-9831: -- Thanks [~xiaobingo] for working on this. The patch looks good to me. One suggestion: can you add some description of the use cases that need to enable the WebHDFS retry policy in hdfs-site.xml. For example, If "true", enable the retry policy of WebHDFS client. This can be useful when using WebHDFS to - copy large files between clusters that could timeout or - copy files between HA clusters that could failover during the copy. {code} 2834 2835 dfs.http.client.retry.policy.enabled 2836 false 2837 2838If "true", enable the retry policy of WebHDFS client. 2839If "false", retry policy is turned off. 2840 2841 {code} > Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 > > > Key: HDFS-9831 > URL: https://issues.apache.org/jira/browse/HDFS-9831 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-9831.000.patch > > > This ticket is opened to document the configuration keys introduced by > HDFS-5219/HDFS-5122 for WebHdfs Retry. Both hdfs-default.xml and webhdfs.md > should be updated with the usage of these keys. > {code} > / WebHDFS retry policy > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = > "dfs.http.client.retry.policy.enabled"; > public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = > false; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = > "dfs.http.client.retry.policy.spec"; > public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = > "1,6,6,10"; //t1,n1,t2,n2,... > public static final String DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = > "dfs.http.client.failover.max.attempts"; > public static final int DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = > 15; > public static final String DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = > "dfs.http.client.retry.max.attempts"; > public static final int DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = > "dfs.http.client.failover.sleep.base.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT > = 500; > public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = > "dfs.http.client.failover.sleep.max.millis"; > public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = > 15000; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9843) Document distcp options required for copying between encrypted locations
[ https://issues.apache.org/jira/browse/HDFS-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166339#comment-15166339 ] Xiaoyu Yao commented on HDFS-9843: -- Thank you, [~cnauroth] for reviewing and committing the patch! > Document distcp options required for copying between encrypted locations > > > Key: HDFS-9843 > URL: https://issues.apache.org/jira/browse/HDFS-9843 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, documentation, encryption >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Fix For: 2.8.0 > > Attachments: HDFS-9843.00.patch, HDFS-9843.01.patch, > HDFS-9843.02.patch > > > In TransparentEncryption.md#Distcp_considerations document section, we have > "Copying_between_encrypted_and_unencrypted_locations" which requires > -skipcrccheck and -update. > These options should be documented as required for "Copying between encrypted > locations" use cases as well because this involves decrypting source file and > encrypting destination file with a different EDEK, resulting in different > checksum at the destination. Distcp will fail at crc check if -skipcrccheck > if not specified. > This ticket is opened to document the required options for "Copying between > encrypted locations" use cases when using distcp with HDFS encryption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9667) StorageType: SSD precede over DISK
[ https://issues.apache.org/jira/browse/HDFS-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109184#comment-15109184 ] Xiaoyu Yao commented on HDFS-9667: -- Thanks [~aderen] for reporting this, this seems to be a dup of HDFS-8361. > StorageType: SSD precede over DISK > -- > > Key: HDFS-9667 > URL: https://issues.apache.org/jira/browse/HDFS-9667 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0, 2.7.0 >Reporter: ade >Assignee: ade > Fix For: 2.6.0 > > Attachments: HDFS-9667.0.patch > > > We enabled the heterogeneous storage in our cluster and there are only ~50% > of datanode & regionserver hosts with SSD. We set hfile with ONE_SSD > storagepolicy but we found block's all replica are DISK often even local host > with SSD storage. The block placement do not choose SSD first to place > replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9667) StorageType: SSD precede over DISK
[ https://issues.apache.org/jira/browse/HDFS-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao resolved HDFS-9667. -- Resolution: Fixed > StorageType: SSD precede over DISK > -- > > Key: HDFS-9667 > URL: https://issues.apache.org/jira/browse/HDFS-9667 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0, 2.7.0 >Reporter: ade >Assignee: ade > Fix For: 2.6.0 > > Attachments: HDFS-9667.0.patch > > > We enabled the heterogeneous storage in our cluster and there are only ~50% > of datanode & regionserver hosts with SSD. We set hfile with ONE_SSD > storagepolicy but we found block's all replica are DISK often even local host > with SSD storage. The block placement do not choose SSD first to place > replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9799) Reimplement getCurrentTrashDir to remove incompatibility
[ https://issues.apache.org/jira/browse/HDFS-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148193#comment-15148193 ] Xiaoyu Yao commented on HDFS-9799: -- Thanks [~zhz] for reporting the issue and working on the fix. bq. The source of the IOException is from getEZForPath. So when getEZForPath gets an exception – meaning that the EZ of the given path cannot be determined at the time of calling, we should just return the Trash dir of the user's home. Even if the path does belong to an EZ, this will just mean the rm will fail later. Can you elaborate when getEZForPath gets an IOException? Based on EncryptionZoneManager#getEZINodeForPath, getEZForPath() just returns null instead of throwing IOException when a given path cannot be determined to be inside a EZ. This makes DFS#getTrashRoots() to include just the Trash dir of the user's home as returned result for non-EZ paths. Should we just fix the annotations? > Reimplement getCurrentTrashDir to remove incompatibility > > > Key: HDFS-9799 > URL: https://issues.apache.org/jira/browse/HDFS-9799 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Zhe Zhang >Assignee: Zhe Zhang >Priority: Blocker > Attachments: HDFS-9799.00.patch, HDFS-9799.01.patch, > HDFS-9799.02.patch, HDFS-9799.03.patch, HDFS-9799.04.patch > > > HDFS-8831 changed the signature of {{TrashPolicy#getCurrentTrashDir}} by > adding an IOException. This breaks other applications using this public API. > This JIRA aims to reimplement the logic to safely handle the IOException > within HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9799) Reimplement getCurrentTrashDir to remove incompatibility
[ https://issues.apache.org/jira/browse/HDFS-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150698#comment-15150698 ] Xiaoyu Yao commented on HDFS-9799: -- Thanks [~zhz] for the explanation. Agree with your changes on getTrashRoot()/getTrashRoots level. For the change in getTrashRoots, can we add some indication for partial results being returned at API level in addition to the log. > Reimplement getCurrentTrashDir to remove incompatibility > > > Key: HDFS-9799 > URL: https://issues.apache.org/jira/browse/HDFS-9799 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Zhe Zhang >Assignee: Zhe Zhang >Priority: Blocker > Attachments: HDFS-9799.00.patch, HDFS-9799.01.patch, > HDFS-9799.02.patch, HDFS-9799.03.patch, HDFS-9799.04.patch > > > HDFS-8831 changed the signature of {{TrashPolicy#getCurrentTrashDir}} by > adding an IOException. This breaks other applications using this public API. > This JIRA aims to reimplement the logic to safely handle the IOException > within HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9711) Integrate CSRF prevention filter in WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151128#comment-15151128 ] Xiaoyu Yao commented on HDFS-9711: -- Thanks [~cnauroth] for working on this. The patch looks good to me +1. One NIT: Can we move WebHdfsFileSystem#getTrimmedStringList() with default string support to StringUtils so that it can be used by configure keys similar to this? > Integrate CSRF prevention filter in WebHDFS. > > > Key: HDFS-9711 > URL: https://issues.apache.org/jira/browse/HDFS-9711 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode, webhdfs >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-9711.001.patch, HDFS-9711.002.patch, > HDFS-9711.003.patch, HDFS-9711.004.patch, HDFS-9711.005.patch > > > HADOOP-12691 introduced a filter in Hadoop Common to help REST APIs guard > against cross-site request forgery attacks. This issue tracks integration of > that filter in WebHDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9711) Integrate CSRF prevention filter in WebHDFS.
[ https://issues.apache.org/jira/browse/HDFS-9711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151258#comment-15151258 ] Xiaoyu Yao commented on HDFS-9711: -- LGTM, Thanks [~cnauroth] for the clarification! > Integrate CSRF prevention filter in WebHDFS. > > > Key: HDFS-9711 > URL: https://issues.apache.org/jira/browse/HDFS-9711 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode, webhdfs >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: HDFS-9711.001.patch, HDFS-9711.002.patch, > HDFS-9711.003.patch, HDFS-9711.004.patch, HDFS-9711.005.patch > > > HADOOP-12691 introduced a filter in Hadoop Common to help REST APIs guard > against cross-site request forgery attacks. This issue tracks integration of > that filter in WebHDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9831) Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122
Xiaoyu Yao created HDFS-9831: Summary: Document webhdfs retry configuration keys introduced by HDFS-5219/HDFS-5122 Key: HDFS-9831 URL: https://issues.apache.org/jira/browse/HDFS-9831 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, webhdfs Affects Versions: 2.6.0 Reporter: Xiaoyu Yao This ticket is opened to document the configuration keys introduced by HDFS-5219/HDFS-5122 for WebHdfs Retry. Both hdfs-default.xml and webhdfs.md should be updated with the usage of these keys. {code} / WebHDFS retry policy public static final String DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_KEY = "dfs.http.client.retry.policy.enabled"; public static final boolean DFS_HTTP_CLIENT_RETRY_POLICY_ENABLED_DEFAULT = false; public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_KEY = "dfs.http.client.retry.policy.spec"; public static final String DFS_HTTP_CLIENT_RETRY_POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,... public static final String DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_KEY = "dfs.http.client.failover.max.attempts"; public static final int DFS_HTTP_CLIENT_FAILOVER_MAX_ATTEMPTS_DEFAULT = 15; public static final String DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_KEY = "dfs.http.client.retry.max.attempts"; public static final int DFS_HTTP_CLIENT_RETRY_MAX_ATTEMPTS_DEFAULT = 10; public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_KEY = "dfs.http.client.failover.sleep.base.millis"; public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_BASE_DEFAULT = 500; public static final String DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_KEY = "dfs.http.client.failover.sleep.max.millis"; public static final int DFS_HTTP_CLIENT_FAILOVER_SLEEPTIME_MAX_DEFAULT = 15000; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9839) Reduce verbosity of processReport logging
[ https://issues.apache.org/jira/browse/HDFS-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155858#comment-15155858 ] Xiaoyu Yao commented on HDFS-9839: -- Patch LGTM. +1. > Reduce verbosity of processReport logging > - > > Key: HDFS-9839 > URL: https://issues.apache.org/jira/browse/HDFS-9839 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9839.01.patch > > > {{BlockManager#processReport}} logs one line for each invalidated block at > INFO. HDFS-7503 moved this logging outside the NameSystem write lock but we > still see the NameNode being slowed down when the number of block > invalidations is very large e.g. just after a large amount of data is deleted. > {code} > for (Block b : invalidatedBlocks) { > blockLog.info("BLOCK* processReport: {} on node {} size {} does not " > + > "belong to any file", b, node, b.getNumBytes()); > } > {code} > We can change this statement to DEBUG and just log the number of block > invalidations at INFO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174487#comment-15174487 ] Xiaoyu Yao commented on HDFS-9881: -- Thanks [~andrew.wang]. Patch LGTM, +1. > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart
Xiaoyu Yao created HDFS-10207: - Summary: Support enable Hadoop IPC backoff without namenode restart Key: HDFS-10207 URL: https://issues.apache.org/jira/browse/HDFS-10207 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaobing Zhou It will be useful to allow changing {{ipc.8020.backoff.enable}} without a namenode restart to protect namenode from being overloaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
Xiaoyu Yao created HDFS-10209: - Summary: Support enable caller context in HDFS namenode audit log without restart namenode Key: HDFS-10209 URL: https://issues.apache.org/jira/browse/HDFS-10209 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaobing Zhou RPC caller context is a useful feature to track down the origin of the caller, which can track down "bad" jobs that overload the namenode. This ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10207) Support enable Hadoop IPC backoff without namenode restart
[ https://issues.apache.org/jira/browse/HDFS-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10207: -- Description: It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a namenode restart to protect namenode from being overloaded. (was: It will be useful to allow changing {{ipc.8020.backoff.enable}} without a namenode restart to protect namenode from being overloaded.) > Support enable Hadoop IPC backoff without namenode restart > -- > > Key: HDFS-10207 > URL: https://issues.apache.org/jira/browse/HDFS-10207 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > > It will be useful to allow changing {{ipc.#port#.backoff.enable}} without a > namenode restart to protect namenode from being overloaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
[ https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210805#comment-15210805 ] Xiaoyu Yao commented on HDFS-10209: --- The configuration key is {{hadoop.caller.context.enabled}} that is {{false}} by default. > Support enable caller context in HDFS namenode audit log without restart > namenode > - > > Key: HDFS-10209 > URL: https://issues.apache.org/jira/browse/HDFS-10209 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > > RPC caller context is a useful feature to track down the origin of the > caller, which can track down "bad" jobs that overload the namenode. This > ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9887) WebHdfs socket timeouts should be configurable
[ https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9887: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks [~and1000] and [~chris.douglas] for the contribution. I've commit the patch to trunk, branch-2 and branch2.8. > WebHdfs socket timeouts should be configurable > -- > > Key: HDFS-9887 > URL: https://issues.apache.org/jira/browse/HDFS-9887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, webhdfs > Environment: all >Reporter: Austin Donnelly >Assignee: Austin Donnelly > Labels: easyfix, newbie > Fix For: 2.8.0 > > Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, > HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, > HADOOP-12827.004.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > WebHdfs client connections use sockets with fixed timeouts of 60 seconds to > connect, and 60 seconds for reads. > This is a problem because I am trying to use WebHdfs to access an archive > storage system which can take minutes to hours to return the requested data > over WebHdfs. > The fix is to add new configuration file options to allow these 60s defaults > to be customised in hdfs-site.xml. > If the new configuration options are not present, the behavior is unchanged > from before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HDFS-9887) WebHdfs socket timeouts should be configurable
[ https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao moved HADOOP-12827 to HDFS-9887: --- Target Version/s: 2.8.0 (was: 2.9.0) Component/s: (was: fs) webhdfs fs Key: HDFS-9887 (was: HADOOP-12827) Project: Hadoop HDFS (was: Hadoop Common) > WebHdfs socket timeouts should be configurable > -- > > Key: HDFS-9887 > URL: https://issues.apache.org/jira/browse/HDFS-9887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, webhdfs > Environment: all >Reporter: Austin Donnelly >Assignee: Austin Donnelly > Labels: easyfix, newbie > Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, > HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, > HADOOP-12827.004.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > WebHdfs client connections use sockets with fixed timeouts of 60 seconds to > connect, and 60 seconds for reads. > This is a problem because I am trying to use WebHdfs to access an archive > storage system which can take minutes to hours to return the requested data > over WebHdfs. > The fix is to add new configuration file options to allow these 60s defaults > to be customised in hdfs-site.xml. > If the new configuration options are not present, the behavior is unchanged > from before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable
[ https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183301#comment-15183301 ] Xiaoyu Yao commented on HDFS-9887: -- Thanks [~jojochuang] for reporting this. Further reading found that the webhdfs specific read/connect timeout implemented by HDFS-9887 should not affect other callers of {{URLConnectionFactory.newSslConnConfigurator()}} such as {{QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage()}}. I will file separate ticket to fix it. > WebHdfs socket timeouts should be configurable > -- > > Key: HDFS-9887 > URL: https://issues.apache.org/jira/browse/HDFS-9887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, webhdfs > Environment: all >Reporter: Austin Donnelly >Assignee: Austin Donnelly > Labels: easyfix, newbie > Fix For: 2.8.0 > > Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, > HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, > HADOOP-12827.004.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > WebHdfs client connections use sockets with fixed timeouts of 60 seconds to > connect, and 60 seconds for reads. > This is a problem because I am trying to use WebHdfs to access an archive > storage system which can take minutes to hours to return the requested data > over WebHdfs. > The fix is to add new configuration file options to allow these 60s defaults > to be customised in hdfs-site.xml. > If the new configuration options are not present, the behavior is unchanged > from before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9914) Fix configurable WebhDFS connect/read timeout
[ https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-9914: Assignee: Xiaoyu Yao > Fix configurable WebhDFS connect/read timeout > - > > Key: HDFS-9914 > URL: https://issues.apache.org/jira/browse/HDFS-9914 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is > opened to fix the following issues in current implementation: > 1. The webhdfs read/connect timeout should not affect connection for other > callers of URLConnectionFactory.newSslConnConfigurator() such as > QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and > TransferFsImage() > 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs > connect/read timeout even if any exception is thrown during customized SSL > configuration. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout
[ https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9914: - Description: Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened to fix the following issues in current implementation: 1. The webhdfs read/connect timeout should not affect connection for other callers of URLConnectionFactory.newSslConnConfigurator() such as QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage() 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs connect/read timeout even if any exception is thrown during customized SSL configuration. 3. OAuth2 webhdfs connection should honor the webhdfs connect/read timeout. was: Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened to fix the following issues in current implementation: 1. The webhdfs read/connect timeout should not affect connection for other callers of URLConnectionFactory.newSslConnConfigurator() such as QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage() 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs connect/read timeout even if any exception is thrown during customized SSL configuration. > Fix configurable WebhDFS connect/read timeout > - > > Key: HDFS-9914 > URL: https://issues.apache.org/jira/browse/HDFS-9914 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is > opened to fix the following issues in current implementation: > 1. The webhdfs read/connect timeout should not affect connection for other > callers of URLConnectionFactory.newSslConnConfigurator() such as > QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and > TransferFsImage() > 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs > connect/read timeout even if any exception is thrown during customized SSL > configuration. > > 3. OAuth2 webhdfs connection should honor the webhdfs connect/read timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout
[ https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9914: - Status: Patch Available (was: Open) > Fix configurable WebhDFS connect/read timeout > - > > Key: HDFS-9914 > URL: https://issues.apache.org/jira/browse/HDFS-9914 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-9914.001.patch > > > Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is > opened to fix the following issues in current implementation: > 1. The webhdfs read/connect timeout should not affect connection for other > callers of URLConnectionFactory.newSslConnConfigurator() such as > QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and > TransferFsImage() > 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs > connect/read timeout even if any exception is thrown during customized SSL > configuration. > > 3. OAuth2 webhdfs connection should honor the webhdfs connect/read timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9914) Fix configurable WebhDFS connect/read timeout
[ https://issues.apache.org/jira/browse/HDFS-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9914: - Attachment: HDFS-9914.001.patch > Fix configurable WebhDFS connect/read timeout > - > > Key: HDFS-9914 > URL: https://issues.apache.org/jira/browse/HDFS-9914 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-9914.001.patch > > > Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is > opened to fix the following issues in current implementation: > 1. The webhdfs read/connect timeout should not affect connection for other > callers of URLConnectionFactory.newSslConnConfigurator() such as > QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and > TransferFsImage() > 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs > connect/read timeout even if any exception is thrown during customized SSL > configuration. > > 3. OAuth2 webhdfs connection should honor the webhdfs connect/read timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9914) Fix configurable WebhDFS connect/read timeout
Xiaoyu Yao created HDFS-9914: Summary: Fix configurable WebhDFS connect/read timeout Key: HDFS-9914 URL: https://issues.apache.org/jira/browse/HDFS-9914 Project: Hadoop HDFS Issue Type: Bug Reporter: Xiaoyu Yao Webhdfs specific read/connect timeout as added HDFS-9887. This ticket is opened to fix the following issues in current implementation: 1. The webhdfs read/connect timeout should not affect connection for other callers of URLConnectionFactory.newSslConnConfigurator() such as QuorumJournalManager#QuorumJournalManger(), DFSck#DFSck() and TransferFsImage() 2. URLConnectionFactory#getSSLConnectionConfiguration() should honor webhdfs connect/read timeout even if any exception is thrown during customized SSL configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable
[ https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183322#comment-15183322 ] Xiaoyu Yao commented on HDFS-9887: -- Filed HDFS-9914 for the fix. > WebHdfs socket timeouts should be configurable > -- > > Key: HDFS-9887 > URL: https://issues.apache.org/jira/browse/HDFS-9887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, webhdfs > Environment: all >Reporter: Austin Donnelly >Assignee: Austin Donnelly > Labels: easyfix, newbie > Fix For: 2.8.0 > > Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, > HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, > HADOOP-12827.004.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > WebHdfs client connections use sockets with fixed timeouts of 60 seconds to > connect, and 60 seconds for reads. > This is a problem because I am trying to use WebHdfs to access an archive > storage system which can take minutes to hours to return the requested data > over WebHdfs. > The fix is to add new configuration file options to allow these 60s defaults > to be customised in hdfs-site.xml. > If the new configuration options are not present, the behavior is unchanged > from before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9887) WebHdfs socket timeouts should be configurable
[ https://issues.apache.org/jira/browse/HDFS-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183241#comment-15183241 ] Xiaoyu Yao commented on HDFS-9887: -- Agree, this is a bug. Webhdfs with ssl configuration exception will not honor the configurable webhdfs connect/read timeout. It will always be {{DEFAULT_TIMEOUT_CONN_CONFIGURATOR}} the default value (1 min). > WebHdfs socket timeouts should be configurable > -- > > Key: HDFS-9887 > URL: https://issues.apache.org/jira/browse/HDFS-9887 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, webhdfs > Environment: all >Reporter: Austin Donnelly >Assignee: Austin Donnelly > Labels: easyfix, newbie > Fix For: 2.8.0 > > Attachments: HADOOP-12827.001.patch, HADOOP-12827.002.patch, > HADOOP-12827.002.patch, HADOOP-12827.002.patch, HADOOP-12827.003.patch, > HADOOP-12827.004.patch > > Original Estimate: 0h > Remaining Estimate: 0h > > WebHdfs client connections use sockets with fixed timeouts of 60 seconds to > connect, and 60 seconds for reads. > This is a problem because I am trying to use WebHdfs to access an archive > storage system which can take minutes to hours to return the requested data > over WebHdfs. > The fix is to add new configuration file options to allow these 60s defaults > to be customised in hdfs-site.xml. > If the new configuration options are not present, the behavior is unchanged > from before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9723) Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext
[ https://issues.apache.org/jira/browse/HDFS-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-9723: - Description: HDFS namenode handles RPC requests from DFS clients and internal processing from datanodes. It has been a recurring pain that some bad jobs overwhelm the namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 is the one of the existing efforts added since Hadoop 2.4 to address this issue. In current FCQ implementation, incoming RPC calls are scheduled based on the number of recent RPC calls of different users with a time-decayed scheduler. This works well when there is a clear mapping between users and their RPC calls from different jobs. However, this may not work effectively when it is hard to track calls to a specific caller in a chain of operations from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for operators/administrators to throttle all the hive jobs because of one “bad” query. This JIRA proposed to leverage RPC caller context information (such as callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative to existing UGI (or user name when delegation token is not available) based Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue (HADOOP-9640) for better namenode throttling in multi-tenancy cluster deployment. was: HDFS namenode handles RPC requests from DFS clients and internal processing from datanodes. It has been a recurring pain that some bad jobs overwhelm the namenode and bring the whole cluster down. FCQ (Fair Call Queue) by HADOOP-9640 is the one of the existing efforts added since Hadoop 2.4 to address this issue. In current FCQ implementation, incoming RPC calls are scheduled based on the number of recent RPC calls (1000) of different users with a time-decayed scheduler. This works well when there is a clear mapping between users and their RPC calls from different jobs. However, this may not work effectively when it is hard to track calls to a specific caller in a chain of operations from the workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for operators/administrators to throttle all the hive jobs because of one “bad” query. This JIRA proposed to leverage RPC caller context information (such as callerType: caller Id from TEZ-2851) available with HDFS-9184 as an alternative to existing UGI (or user name when delegation token is not available) based Identify Provider to improve effectiveness Hadoop RPC Fair Call Queue (HADOOP-9640) for better namenode throttling in multi-tenancy cluster deployment. > Improve Namenode Throttling Against Bad Jobs with FCQ and CallerContext > --- > > Key: HDFS-9723 > URL: https://issues.apache.org/jira/browse/HDFS-9723 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > HDFS namenode handles RPC requests from DFS clients and internal processing > from datanodes. It has been a recurring pain that some bad jobs overwhelm the > namenode and bring the whole cluster down. FCQ (Fair Call Queue) by > HADOOP-9640 is the one of the existing efforts added since Hadoop 2.4 to > address this issue. > In current FCQ implementation, incoming RPC calls are scheduled based on the > number of recent RPC calls of different users with a time-decayed scheduler. > This works well when there is a clear mapping between users and their RPC > calls from different jobs. However, this may not work effectively when it is > hard to track calls to a specific caller in a chain of operations from the > workflow (e.g.Oozie -> Hive -> Yarn). It is not feasible for > operators/administrators to throttle all the hive jobs because of one “bad” > query. > This JIRA proposed to leverage RPC caller context information (such as > callerType: caller Id from TEZ-2851) available with HDFS-9184 as an > alternative to existing UGI (or user name when delegation token is not > available) based Identify Provider to improve effectiveness Hadoop RPC Fair > Call Queue (HADOOP-9640) for better namenode throttling in multi-tenancy > cluster deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10253) Fix TestRefreshCallQueue failure.
[ https://issues.apache.org/jira/browse/HDFS-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10253: -- Attachment: HDFS-10253.00.patch Thanks [~brahmareddy] for catching this. Attach a simple fix for reference if you have not started working on it . > Fix TestRefreshCallQueue failure. > - > > Key: HDFS-10253 > URL: https://issues.apache.org/jira/browse/HDFS-10253 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-10253.00.patch > > > *Jenkins link* > https://builds.apache.org/job/PreCommit-HDFS-Build/15041/testReport/ > *Trace* > {noformat} > java.lang.RuntimeException: > org.apache.hadoop.TestRefreshCallQueue$MockCallQueue could not be constructed. > at > org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:164) > at > org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:70) > at org.apache.hadoop.ipc.Server.(Server.java:2579) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:421) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:759) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:701) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:900) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:879) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1596) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441) > at > org.apache.hadoop.TestRefreshCallQueue.setUp(TestRefreshCallQueue.java:71) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10253) Fix TestRefreshCallQueue failure.
[ https://issues.apache.org/jira/browse/HDFS-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10253: -- Status: Patch Available (was: Open) > Fix TestRefreshCallQueue failure. > - > > Key: HDFS-10253 > URL: https://issues.apache.org/jira/browse/HDFS-10253 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-10253.00.patch > > > *Jenkins link* > https://builds.apache.org/job/PreCommit-HDFS-Build/15041/testReport/ > *Trace* > {noformat} > java.lang.RuntimeException: > org.apache.hadoop.TestRefreshCallQueue$MockCallQueue could not be constructed. > at > org.apache.hadoop.ipc.CallQueueManager.createCallQueueInstance(CallQueueManager.java:164) > at > org.apache.hadoop.ipc.CallQueueManager.(CallQueueManager.java:70) > at org.apache.hadoop.ipc.Server.(Server.java:2579) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:535) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:510) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:421) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:759) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:701) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:900) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:879) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1596) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441) > at > org.apache.hadoop.TestRefreshCallQueue.setUp(TestRefreshCallQueue.java:71) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
[ https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217025#comment-15217025 ] Xiaoyu Yao commented on HDFS-10209: --- Thanks [~xiaobingo] for working on this. Patch looks good to me. +1 pending Jenkins. > Support enable caller context in HDFS namenode audit log without restart > namenode > - > > Key: HDFS-10209 > URL: https://issues.apache.org/jira/browse/HDFS-10209 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-10209-HDFS-9000.000.patch > > > RPC caller context is a useful feature to track down the origin of the > caller, which can track down "bad" jobs that overload the namenode. This > ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
[ https://issues.apache.org/jira/browse/HDFS-10286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240631#comment-15240631 ] Xiaoyu Yao commented on HDFS-10286: --- Patch looks good to me. +1 pending Jenkins. > Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties > > > Key: HDFS-10286 > URL: https://issues.apache.org/jira/browse/HDFS-10286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Attachments: HDFS-10286.000.patch > > > HDFS-10209 introduced a new reconfigurable properties which requires an > update to the validation in > TestDFSAdmin#testNameNodeGetReconfigurableProperties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
[ https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240592#comment-15240592 ] Xiaoyu Yao commented on HDFS-10209: --- I opened HDFS-10286 and attached your patch to it. > Support enable caller context in HDFS namenode audit log without restart > namenode > - > > Key: HDFS-10209 > URL: https://issues.apache.org/jira/browse/HDFS-10209 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-10209-HDFS-9000.000.patch, > HDFS-10209-HDFS-9000.001.patch, HDFS-10209-HDFS-9000.UT-fix.patch > > > RPC caller context is a useful feature to track down the origin of the > caller, which can track down "bad" jobs that overload the namenode. This > ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10209) Support enable caller context in HDFS namenode audit log without restart namenode
[ https://issues.apache.org/jira/browse/HDFS-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240588#comment-15240588 ] Xiaoyu Yao commented on HDFS-10209: --- [~xiaobingo], please open a separate ticket for the unit test fix and link it to HDFS-10209. Thanks > Support enable caller context in HDFS namenode audit log without restart > namenode > - > > Key: HDFS-10209 > URL: https://issues.apache.org/jira/browse/HDFS-10209 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Xiaobing Zhou > Fix For: 2.9.0 > > Attachments: HDFS-10209-HDFS-9000.000.patch, > HDFS-10209-HDFS-9000.001.patch, HDFS-10209-HDFS-9000.UT-fix.patch > > > RPC caller context is a useful feature to track down the origin of the > caller, which can track down "bad" jobs that overload the namenode. This > ticket is opened to allow enabling caller context without namenode restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10286) Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties
Xiaoyu Yao created HDFS-10286: - Summary: Fix TestDFSAdmin#testNameNodeGetReconfigurableProperties Key: HDFS-10286 URL: https://issues.apache.org/jira/browse/HDFS-10286 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaobing Zhou HDFS-10209 introduced a new reconfigurable properties which requires an update to the validation in TestDFSAdmin#testNameNodeGetReconfigurableProperties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit
[ https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256633#comment-15256633 ] Xiaoyu Yao commented on HDFS-10324: --- [~jojochuang], I mentioned #2 because Trash is client feature that used to not require file system operation like {{hdfs dfs -mkdir /ez/tmp; hdfs dfs -chmod 1777 /ez/tmp}} for its operation. Since this is encryption zone specific, it might be easier to have a single crytoadmin cmd to handle it. > Trash directory in an encryption zone should be pre-created with sticky bit > --- > > Key: HDFS-10324 > URL: https://issues.apache.org/jira/browse/HDFS-10324 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.8.0 > Environment: CDH5.7.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch > > > We encountered a bug in HDFS-8831: > After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash > subdirectory within the encryption zone. > However, if this .Trash subdirectory is not created beforehand, it will be > created and owned by the first user who deleted a file, with permission > drwx--. This creates a serious bug because any other non-privileged user > will not be able to delete any files within the encryption zone, because they > do not have the permission to move directories to the trash directory. > We should fix this bug, by pre-creating the .Trash directory with sticky bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)