[jira] [Updated] (HDFS-6774) Make FsDataset and DataStore support removing volumes.
[ https://issues.apache.org/jira/browse/HDFS-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6774: Attachment: HDFS-6774.002.patch Hi, [~atm] Thanks for the review. bq. The change in BlockPoolSlice - was that just a separate bug? Or why was that necessary? It is indeed a separate bug. Should I open a new JIRA for this? bq. I see the code where we remove the replica info from the replica map, but do we not also need to do something similar in the event that the replica is currently referenced in the BlockScanner or DirectoryScanner data structures? It could be that we don't, but I wanted to check with you to see if you've considered this case. Thanks for finding this bug. {{BlockScanner}} needs to remove the blocks and the corresponding patch is attached. I think that {{DirectoryScanner}} does not need to change, since it uses {{FsDatasetImpl}} as input source for each scan and performs {{DirectoryScanner#isValid()}}, which verifies whether the volume is still available, when generate disk reports. This logic is the same as handling disk failures. Therefore, {{DirectoryScanner}}'s consistency can be provided by {{FsDatasetImpl}}. What is your opinion about this? Make FsDataset and DataStore support removing volumes. -- Key: HDFS-6774 URL: https://issues.apache.org/jira/browse/HDFS-6774 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6774.000.patch, HDFS-6774.001.patch, HDFS-6774.002.patch Managing volumes on DataNode includes decommissioning an active volume without restarting DataNode. This task adds support to remove volumes from {{DataStorage}} and {{BlockPoolSliceStorage}} dynamically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6776: Attachment: HDFS-6776.006.NullToken.patch Upload version 006 that introduces NullToken exception, which hopefully is better than earlier versions that uses msg parsing. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:565) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099900#comment-14099900 ] Hadoop QA commented on HDFS-6776: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662349/HDFS-6776.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHDFSServerPorts org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7652//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7652//console This message is automatically generated. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at
[jira] [Updated] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-6621: -- Description: I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} was: I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive =
[jira] [Commented] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099914#comment-14099914 ] Yi Liu commented on HDFS-3862: -- After talking with [~rakeshr], I will work on this issue. Thanks. QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Attachment: HDFS-3862.001.patch The patch adds a new API to JournalManager: {{boolean isNativelySingleWriter();}} And hanle following cases: * if the shared storage has built-in single-write semantics, then user *is not forced* to specify a fencer. * if #1, but a forcer is configured, then ZKFC will do forcing as original logic; but if forcing is failed, it’s ignored with warning log. Failover will continue. * if #1, but a bad forcer is configured, then forcer is ignored with warning log. Failover will continue. * if specify “forcefence” option for failover when using DFSHAAdmin, then a forcer should be configured even if #1 QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099919#comment-14099919 ] Harsh J commented on HDFS-6621: --- [~ravwojdyla] - If you are still working on this, would you be posting your patch for the issue here for review soon? Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. {code} private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6648) Order of namenodes in ConfiguredFailoverProxyProvider is not defined by order in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099918#comment-14099918 ] Harsh J commented on HDFS-6648: --- I don't think it was the design goal to have them being connected to in any order of configuration; just based off a list in the configuration. Is this posing a Major problem for you though? The proxy is invoked only during init and trouble within an instance, not at every request. Order of namenodes in ConfiguredFailoverProxyProvider is not defined by order in hdfs-site.xml -- Key: HDFS-6648 URL: https://issues.apache.org/jira/browse/HDFS-6648 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs-client Affects Versions: 2.2.0 Reporter: Rafal Wojdyla In org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider, in the constructor, there's a map nameservice : service-id : service-rpc-address (DFSUtil.getHaNnRpcAddresses). It's a LinkedHashMap of HashMaps. The order is kept for _nameservices_. Then to find active namenode, for nameservice, we get HashMap of service-id : service-rpc-address for requested nameservice (taken from URI request), And for this HashMap we get values - order of this collection is not strictly defined! In the code: {code} CollectionInetSocketAddress addressesOfNns = addressesInNN.values(); {code} And then we put these values (in not defined order) into ArrayList of proxies, and then in getProxy we start from first proxy in the list and failover to next if needed. It would make sense for ConfiguredFailoverProxyProvider to keep order of proxies/namenodes defined in hdfs-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Attachment: (was: HDFS-3862.001.patch) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Attachment: HDFS-3862.001.patch QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Status: Patch Available (was: Open) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Attachment: HDFS-3862.001.patch QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Attachment: (was: HDFS-3862.001.patch) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work
[ https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099936#comment-14099936 ] Hadoop QA commented on HDFS-6776: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662355/HDFS-6776.006.NullToken.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithNodeGroup org.apache.hadoop.hdfs.TestHDFSServerPorts {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7654//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7654//console This message is automatically generated. distcp from insecure cluster (source) to secure cluster (destination) doesn't work -- Key: HDFS-6776 URL: https://issues.apache.org/jira/browse/HDFS-6776 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0, 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch Issuing distcp command at the secure cluster side, trying to copy stuff from insecure cluster to secure cluster, and see the following problem: {code} hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp hdfs://sure-cluster:8020/tmp/tmptgt 14/07/30 20:06:19 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[webhdfs://insecure-cluster:port/tmp], targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true} 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at secure-clister:8032 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 WARN security.UserGroupInformation: PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) cause:java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at
[jira] [Commented] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099954#comment-14099954 ] Hadoop QA commented on HDFS-3862: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662359/HDFS-3862.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.tools.TestDFSHAAdmin The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.TestHDFSServerPorts org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7655//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7655//console This message is automatically generated. QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099964#comment-14099964 ] Hadoop QA commented on HDFS-3862: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662361/HDFS-3862.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverAndFencing org.apache.hadoop.hdfs.tools.TestDFSHAAdmin org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal: org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7656//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7656//console This message is automatically generated. QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6859) Allow dfs.data.transfer.protection default to hadoop.rpc.protection
Benoy Antony created HDFS-6859: -- Summary: Allow dfs.data.transfer.protection default to hadoop.rpc.protection Key: HDFS-6859 URL: https://issues.apache.org/jira/browse/HDFS-6859 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.5.0 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Currently administrator needs to configure both _dfs.data.transfer.protection_ and _hadoop.rpc.protection_ to specify _QOP_ for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection. This also ensures that an admin will get QOP as _Authentication_ if admin does not specify either of those values. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6859) Allow dfs.data.transfer.protection default to hadoop.rpc.protection
[ https://issues.apache.org/jira/browse/HDFS-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6859: --- Attachment: HDFS-6859.patch Attached is the patch which enables the default value. Allow dfs.data.transfer.protection default to hadoop.rpc.protection --- Key: HDFS-6859 URL: https://issues.apache.org/jira/browse/HDFS-6859 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.5.0 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Attachments: HDFS-6859.patch Currently administrator needs to configure both _dfs.data.transfer.protection_ and _hadoop.rpc.protection_ to specify _QOP_ for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection. This also ensures that an admin will get QOP as _Authentication_ if admin does not specify either of those values. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6859) Allow dfs.data.transfer.protection default to hadoop.rpc.protection
[ https://issues.apache.org/jira/browse/HDFS-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6859: --- Status: Patch Available (was: Open) Allow dfs.data.transfer.protection default to hadoop.rpc.protection --- Key: HDFS-6859 URL: https://issues.apache.org/jira/browse/HDFS-6859 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.5.0 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Attachments: HDFS-6859.patch Currently administrator needs to configure both _dfs.data.transfer.protection_ and _hadoop.rpc.protection_ to specify _QOP_ for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection. This also ensures that an admin will get QOP as _Authentication_ if admin does not specify either of those values. Separate jiras (HDFS-6858 and HDFS-6859) are created for dfs.data.transfer.saslproperties.resolver.class and dfs.data.transfer.protection respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6859) Allow dfs.data.transfer.protection default to hadoop.rpc.protection
[ https://issues.apache.org/jira/browse/HDFS-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated HDFS-6859: --- Description: Currently administrator needs to configure both _dfs.data.transfer.protection_ and _hadoop.rpc.protection_ to specify _QOP_ for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection. This also ensures that an admin will get QOP as _Authentication_ if admin does not specify either of those values. Separate jiras (HDFS-6858 and HDFS-6859) are created for dfs.data.transfer.saslproperties.resolver.class and dfs.data.transfer.protection respectively. was: Currently administrator needs to configure both _dfs.data.transfer.protection_ and _hadoop.rpc.protection_ to specify _QOP_ for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection. This also ensures that an admin will get QOP as _Authentication_ if admin does not specify either of those values. Allow dfs.data.transfer.protection default to hadoop.rpc.protection --- Key: HDFS-6859 URL: https://issues.apache.org/jira/browse/HDFS-6859 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.5.0 Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Attachments: HDFS-6859.patch Currently administrator needs to configure both _dfs.data.transfer.protection_ and _hadoop.rpc.protection_ to specify _QOP_ for rpc and data transfer protocols. In some cases, the values for these two properties will be same. In those cases, it may be easier to allow dfs.data.transfer.protection default to hadoop.rpc.protection. This also ensures that an admin will get QOP as _Authentication_ if admin does not specify either of those values. Separate jiras (HDFS-6858 and HDFS-6859) are created for dfs.data.transfer.saslproperties.resolver.class and dfs.data.transfer.protection respectively. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6648) Order of namenodes in ConfiguredFailoverProxyProvider is not defined by order in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099984#comment-14099984 ] Rafal Wojdyla commented on HDFS-6648: - Hi [~qwertymaniac], good to know that it wasn't a design goal - btw, what is the best/easiest way to check what were the design goals for given class/component - is Jira the only good place for that? Java doc for ConfiguredFailoverProxyProvider says: {code} /** * A FailoverProxyProvider implementation which allows one to configure two URIs * to connect to during fail-over. The first configured address is tried first, * and on a fail-over event the other address is tried. */ public class ConfiguredFailoverProxyProviderT extends {code} It says The first configured address is tried first - which is not true. This was a major issue for us due to other bugs, including but not limited to: * HDFS-5064 * HDFS-4858 So at the end of the day some clients were trying to connect to Standby Namenode which sometimes was very unresponsive, it was killing the performance big time. Order taken from configuration file makes it more intuitive for administrator, and makes it possible for administrator to mitigate bugs like the ones above by explicitly defining order of namenodes. Order of namenodes in ConfiguredFailoverProxyProvider is not defined by order in hdfs-site.xml -- Key: HDFS-6648 URL: https://issues.apache.org/jira/browse/HDFS-6648 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs-client Affects Versions: 2.2.0 Reporter: Rafal Wojdyla In org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider, in the constructor, there's a map nameservice : service-id : service-rpc-address (DFSUtil.getHaNnRpcAddresses). It's a LinkedHashMap of HashMaps. The order is kept for _nameservices_. Then to find active namenode, for nameservice, we get HashMap of service-id : service-rpc-address for requested nameservice (taken from URI request), And for this HashMap we get values - order of this collection is not strictly defined! In the code: {code} CollectionInetSocketAddress addressesOfNns = addressesInNN.values(); {code} And then we put these values (in not defined order) into ArrayList of proxies, and then in getProxy we start from first proxy in the list and failover to next if needed. It would make sense for ConfiguredFailoverProxyProvider to keep order of proxies/namenodes defined in hdfs-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6826) Plugin interface to enable delegation of HDFS authorization assertions
[ https://issues.apache.org/jira/browse/HDFS-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1412#comment-1412 ] Alejandro Abdelnur commented on HDFS-6826: -- [~apurtell], cell level is out of scope from this proposal. This proposal focuses on providing 'synchronized' authorization between data entities and the associated files for the use cases where the files fully belong to a single data entity. If a file contains data for multiple data entities (Hbase cell, columns of a CSV file mapped to a HiveMetaStore table), it is not possible to map authorization to a file in a secure way (enforced by HDFS; you could enforce that a client lib level, but a modified client lib will give you access to the whole file). My take is that, in the case of authorization at cell level, this will always remain in HBase. Otherwise, we would require an authorization source with the scalability of HBase and with more performance than HBase. Plugin interface to enable delegation of HDFS authorization assertions -- Key: HDFS-6826 URL: https://issues.apache.org/jira/browse/HDFS-6826 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.4.1 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFS-6826-idea.patch, HDFS-6826-idea2.patch, HDFS-6826v3.patch, HDFSPluggableAuthorizationProposal-v2.pdf, HDFSPluggableAuthorizationProposal.pdf When Hbase data, HiveMetaStore data or Search data is accessed via services (Hbase region servers, HiveServer2, Impala, Solr) the services can enforce permissions on corresponding entities (databases, tables, views, columns, search collections, documents). It is desirable, when the data is accessed directly by users accessing the underlying data files (i.e. from a MapReduce job), that the permission of the data files map to the permissions of the corresponding data entity (i.e. table, column family or search collection). To enable this we need to have the necessary hooks in place in the NameNode to delegate authorization to an external system that can map HDFS files/directories to data entities and resolve their permissions based on the data entities permissions. I’ll be posting a design proposal in the next few days. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6188) An ip whitelist based implementation of TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100019#comment-14100019 ] Arpit Agarwal commented on HDFS-6188: - Hi Benoy, thanks for updating the patch. I think the package should be {{org.apache.hadoop.hdfs.protocol.datatransfer}} to match the file location. An ip whitelist based implementation of TrustedChannelResolver -- Key: HDFS-6188 URL: https://issues.apache.org/jira/browse/HDFS-6188 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6188.patch, HDFS-6188.patch HDFS-5910 added ability to plugin an custom logic to determine whether a channel is trusted or not, via TrustedChannelResolver Attached is an implementation of TrustedChannelResolver based on a ip whitelist. This is dependent on HADOOP-10335 which has the support for ip whitelist -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6860) BlockStateChange logs are too noisy
Arpit Agarwal created HDFS-6860: --- Summary: BlockStateChange logs are too noisy Key: HDFS-6860 URL: https://issues.apache.org/jira/browse/HDFS-6860 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Arpit Agarwal Block State Change logs are too noisy at the default INFO level and affect NN performance on busy clusters. Most of these state changes can be logged at debug level instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6188) An ip whitelist based implementation of TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100052#comment-14100052 ] Benoy Antony commented on HDFS-6188: Package name is _org.apache.hadoop.hdfs.protocol.datatransfer_ in the latest patch. An ip whitelist based implementation of TrustedChannelResolver -- Key: HDFS-6188 URL: https://issues.apache.org/jira/browse/HDFS-6188 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6188.patch, HDFS-6188.patch HDFS-5910 added ability to plugin an custom logic to determine whether a channel is trusted or not, via TrustedChannelResolver Attached is an implementation of TrustedChannelResolver based on a ip whitelist. This is dependent on HADOOP-10335 which has the support for ip whitelist -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6188) An ip whitelist based implementation of TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6188: Hadoop Flags: Reviewed Status: Patch Available (was: Open) Sorry I think at the wrong patch. +1 pending Jenkins. An ip whitelist based implementation of TrustedChannelResolver -- Key: HDFS-6188 URL: https://issues.apache.org/jira/browse/HDFS-6188 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6188.patch, HDFS-6188.patch HDFS-5910 added ability to plugin an custom logic to determine whether a channel is trusted or not, via TrustedChannelResolver Attached is an implementation of TrustedChannelResolver based on a ip whitelist. This is dependent on HADOOP-10335 which has the support for ip whitelist -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test
[ https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100148#comment-14100148 ] Zhe Zhang commented on HDFS-5135: - By internal methods do you mean nfsd.read,write, readdir etc.? If those are to be avoided, I think we need to call 'mnt' and create a mount point (e.g., /mnt/test_hdfs/), and create regular FileInputStream/FileOutSstream out of it. Then we can test read, write, readdir, and client access privilege. Let me know if you think that's the right direction. Thanks. Create a test framework to enable NFS end to end unit test -- Key: HDFS-5135 URL: https://issues.apache.org/jira/browse/HDFS-5135 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Zhe Zhang Currently, we have to manually start portmap and nfs3 processes to test patch and new functionalities. This JIRA is to track the effort to introduce a test framework to NFS unit test without starting standalone nfs3 processes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5135) Create a test framework to enable NFS end to end unit test
[ https://issues.apache.org/jira/browse/HDFS-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100173#comment-14100173 ] jay vyas commented on HDFS-5135: You also might be interested in the apache bigtop HCFS fuse mount tests (BIGTOP-1221), which attempt to test e2e operations for the hadoops fuse mount... We could generalize this to supprt nfs mount as well... Another alternative is to contribute the end to end tests as an extension to the bigtop tests. Since bigtop builds and packages and is designed to e2e test Hadoop on a running system, these tests are quite common. In fact we do have a end to end fuse mount test, maybe we can add an nfs one as well. https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hadoop/src/main/groovy/org/apache/bigtop/itest/hadoop/hcfs/TestFuseHCFS.groovy Create a test framework to enable NFS end to end unit test -- Key: HDFS-5135 URL: https://issues.apache.org/jira/browse/HDFS-5135 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Zhe Zhang Currently, we have to manually start portmap and nfs3 processes to test patch and new functionalities. This JIRA is to track the effort to introduce a test framework to NFS unit test without starting standalone nfs3 processes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6188) An ip whitelist based implementation of TrustedChannelResolver
[ https://issues.apache.org/jira/browse/HDFS-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100182#comment-14100182 ] Hadoop QA commented on HDFS-6188: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662320/HDFS-6188.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster org.apache.hadoop.hdfs.TestHDFSServerPorts {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7658//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7658//console This message is automatically generated. An ip whitelist based implementation of TrustedChannelResolver -- Key: HDFS-6188 URL: https://issues.apache.org/jira/browse/HDFS-6188 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6188.patch, HDFS-6188.patch HDFS-5910 added ability to plugin an custom logic to determine whether a channel is trusted or not, via TrustedChannelResolver Attached is an implementation of TrustedChannelResolver based on a ip whitelist. This is dependent on HADOOP-10335 which has the support for ip whitelist -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3862) QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics
[ https://issues.apache.org/jira/browse/HDFS-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-3862: - Attachment: HDFS-3862.002.patch fix test failure. QJM: don't require a fencer to be configured if shared storage has built-in single-writer semantics --- Key: HDFS-3862 URL: https://issues.apache.org/jira/browse/HDFS-3862 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Yi Liu Attachments: HDFS-3862.001.patch, HDFS-3862.002.patch Currently, NN HA requires that the administrator configure a fencing method to ensure that only a single NameNode may write to the shared storage at a time. Some shared edits storage implementations (like QJM) inherently enforce single-writer semantics at the storage level, and thus the user should not be forced to specify one. We should extend the JournalManager interface so that the HA code can operate without a configured fencer if the JM has such built-in fencing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100234#comment-14100234 ] Arpit Agarwal commented on HDFS-6783: - It looks like this checkin introduced test failures in the build. {code} org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints org.apache.hadoop.hdfs.server.namenode.ha.TestHAMetrics org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings {code} TestValidateConfigurationSettings consistently fails for me and 'git bisect' implicated this commit. After reverting it locally the test passed. From a quick look at the patch I am not sure why it would result in these test failures. Any ideas? Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100245#comment-14100245 ] Yi Liu commented on HDFS-6783: -- Hi [~arpitagarwal], I found the reason, it's bug. In rescan() {{namesystem.writeLock();}} and {{namesystem.writeUnlock();}} should be in same {{try...finally}}. This problem will occur when we call CacheReplicationMonitor#close(). Let me reopen this JIRA and fix it, please help to review. Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu reopened HDFS-6783: -- Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-6783: - Attachment: HDFS-6783.006.fix-tests.patch Fix the test timeouts Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.fix-tests.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6783: Status: Patch Available (was: Reopened) Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.fix-tests.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100258#comment-14100258 ] Arpit Agarwal commented on HDFS-6783: - Thanks for the prompt fix [~hitliuyi]. +1 for the patch, pending Jenkins. Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.fix-tests.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6783) Fix HDFS CacheReplicationMonitor rescan logic
[ https://issues.apache.org/jira/browse/HDFS-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100259#comment-14100259 ] Yi Liu commented on HDFS-6783: -- [~arpitagarwal], thanks a lot for review. Verified in local env, and let's wait for the Jenkins. Fix HDFS CacheReplicationMonitor rescan logic - Key: HDFS-6783 URL: https://issues.apache.org/jira/browse/HDFS-6783 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6783.001.patch, HDFS-6783.002.patch, HDFS-6783.003.patch, HDFS-6783.004.patch, HDFS-6783.005.patch, HDFS-6783.006.fix-tests.patch, HDFS-6783.006.patch In monitor thread, needsRescan is set to false before real scan starts, so for {{waitForRescanIfNeeded}} will return for the first condition: {code} if (!needsRescan) { return; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)