[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808843#comment-16808843 ] Jim Brennan commented on HDFS-14394: I am +1 on this (non-binding). With gcc 4.8.5 on rhel7 my compilation was failing. This patch fixes it. > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905289#comment-16905289 ] Jim Brennan commented on HDFS-12914: [~jojochuang], the revert of the commit for branch-2 appears to have broken the build: {noformat} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/jbrennan02/git/apache-hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java:[226,23] cannot find symbol symbol: method setBlockManagerForTesting(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager) location: class org.apache.hadoop.hdfs.server.namenode.FSNamesystem [INFO] 1 error {noformat} When I revert this commit, I can build: {noformat} commit 585b6de63721f3ea8057677676038a6f8f2c33f5 (HEAD -> branch-2, apache-hadoop/branch-2) Author: Wei-Chiu Chuang Date: Fri Aug 9 16:59:27 2019 -0700 Revert "HDFS-12914. Block report leases cause missing blocks until next report. Contributed by Santosh Marella, He Xiaoqiao." This reverts commit 567e1178d88ccfc258ce2ade4f8af66cc5a4daa7. {noformat} > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-2.000.patch, > HDFS-12914.branch-2.001.patch, HDFS-12914.branch-2.002.patch, > HDFS-12914.branch-2.8.001.patch, HDFS-12914.branch-2.8.002.patch, > HDFS-12914.branch-2.patch, HDFS-12914.branch-3.0.patch, > HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, > HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14726) Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519
[ https://issues.apache.org/jira/browse/HDFS-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919868#comment-16919868 ] Jim Brennan commented on HDFS-14726: This should be marked as a blocker for 2.10 with the release-blocker label. cc: [~jhung] > Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519 > -- > > Key: HDFS-14726 > URL: https://issues.apache.org/jira/browse/HDFS-14726 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node >Affects Versions: 2.10.0 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Blocker > Attachments: HDFS-14726-branch-2.001.patch, > HDFS-14726-branch-2.002.patch, HDFS-14726-branch-2.003.patch > > > HDFS-10519 has been backported to branch-2. However HDFS-10519 introduced an > incompatibility issue between NN and JN due to the new protobuf field > {{committedTxnId}} in {{HdfsServer.proto}}. This field was introduced as a > required field so if JN and NN are not on same version, it will run into > missing field exception. Although currently we can get around by making sure > JN always gets upgraded properly before NN, we can potentially fix this > incompatibility by changing the field to optional. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs
[ https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901417#comment-16901417 ] Jim Brennan commented on HDFS-12491: The code looks good, but you should declare all of the new functions static as they are only used in jni_helper.c LibHdfs.md should be updated to remove the warning about wildcards. > Support wildcard in CLASSPATH for libhdfs > - > > Key: HDFS-12491 > URL: https://issues.apache.org/jira/browse/HDFS-12491 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: HDFS-12491.001.patch, testWildCard.sh > > > According to libhdfs doc, wildcard in CLASSPATH is not support: > bq. The most common problem is the CLASSPATH is not set properly when calling > a program that uses libhdfs. Make sure you set it to all the Hadoop jars > needed to run Hadoop itself as well as the right configuration directory > containing hdfs-site.xml. It is not valid to use wildcard syntax for > specifying multiple jars. It may be useful to run hadoop classpath --glob or > hadoop classpath --jar to generate the correct classpath for your > deployment. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs
[ https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901484#comment-16901484 ] Jim Brennan commented on HDFS-12491: [~samkhan], thanks for updating the patch! (nit) on patch 002 LibHdfs.md: I don't think you should put a newline between the sentences. That said, I'm not sure we need the new sentence: {{Wildcard entries in the `CLASSPATH` are now supported by libhdfs.}} I think just removing the statement that it is not supported is sufficient. I am +1 on the code changes (non-binding) > Support wildcard in CLASSPATH for libhdfs > - > > Key: HDFS-12491 > URL: https://issues.apache.org/jira/browse/HDFS-12491 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: HDFS-12491.001.patch, HDFS-12491.002.patch, > testWildCard.sh > > > According to libhdfs doc, wildcard in CLASSPATH is not support: > bq. The most common problem is the CLASSPATH is not set properly when calling > a program that uses libhdfs. Make sure you set it to all the Hadoop jars > needed to run Hadoop itself as well as the right configuration directory > containing hdfs-site.xml. It is not valid to use wildcard syntax for > specifying multiple jars. It may be useful to run hadoop classpath --glob or > hadoop classpath --jar to generate the correct classpath for your > deployment. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs
[ https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902997#comment-16902997 ] Jim Brennan commented on HDFS-12491: [~kihwal] can you review this as well? > Support wildcard in CLASSPATH for libhdfs > - > > Key: HDFS-12491 > URL: https://issues.apache.org/jira/browse/HDFS-12491 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: HDFS-12491.001.patch, HDFS-12491.002.patch, > testWildCard.sh > > > According to libhdfs doc, wildcard in CLASSPATH is not support: > bq. The most common problem is the CLASSPATH is not set properly when calling > a program that uses libhdfs. Make sure you set it to all the Hadoop jars > needed to run Hadoop itself as well as the right configuration directory > containing hdfs-site.xml. It is not valid to use wildcard syntax for > specifying multiple jars. It may be useful to run hadoop classpath --glob or > hadoop classpath --jar to generate the correct classpath for your > deployment. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
[ https://issues.apache.org/jira/browse/HDFS-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942084#comment-16942084 ] Jim Brennan commented on HDFS-14858: Thanks [~vagarychen]! patch004 looks good to me. +1 (non-binding). > [SBN read] Allow configurably enable/disable AlignmentContext on NameNode > - > > Key: HDFS-14858 > URL: https://issues.apache.org/jira/browse/HDFS-14858 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14858.001.patch, HDFS-14858.002.patch, > HDFS-14858.003.patch, HDFS-14858.004.patch > > > As brought up under HDFS-14277, we should make sure SBN read has no > performance impact when it is not enabled. One potential overhead of SBN read > is maintaining and updating additional state status on NameNode. > Specifically, this is done by creating/updating/checking a > {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, > this logic is still be checked. We can make this configurable so that when > SBN read is not enabled, there is no such overhead and everything works as-is. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
[ https://issues.apache.org/jira/browse/HDFS-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936863#comment-16936863 ] Jim Brennan commented on HDFS-14858: [~vagarychen], [~jojochuang] As a major new feature for branch-2, I think this should default to false (DFS_NAMENODE_STATE_CONTEXT_ENABLED_DEFAULT = false). > [SBN read] Allow configurably enable/disable AlignmentContext on NameNode > - > > Key: HDFS-14858 > URL: https://issues.apache.org/jira/browse/HDFS-14858 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14858.001.patch, HDFS-14858.002.patch > > > As brought up under HDFS-14277, we should make sure SBN read has no > performance impact when it is not enabled. One potential overhead of SBN read > is maintaining and updating additional state status on NameNode. > Specifically, this is done by creating/updating/checking a > {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, > this logic is still be checked. We can make this configurable so that when > SBN read is not enabled, there is no such overhead and everything works as-is. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969409#comment-16969409 ] Jim Brennan commented on HDFS-14958: Thanks for the reviews [~ayushtkn] and [~inigoiri]! Can someone please commit this? I would ideally like it pulled back to branch-2.10 - that is where I found the problem - we have some internal changes to NetworkTopologyWithNodeGroup so this test was actually failing for us. cc: [~kihwal] > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969439#comment-16969439 ] Jim Brennan commented on HDFS-14958: Thanks [~ayushtkn]! > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0 > > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968653#comment-16968653 ] Jim Brennan commented on HDFS-14958: [~inigoiri] I agree. I think this test pre-dates DFSNetworkTopology, so perhaps it did work originally. I tried changing it to use NetworkTopology and one of the test fails in that case, so it does differentiate between NetworkTopology and NetworkTopologyWithNodeGroup. They just all succeed with the default DFSNetworkTopology, which is why we missed this. Should we file another Jira to improve this test? The unit tests that failed appear to be unrelated to this patch: {noformat} [ERROR] Failures: [ERROR] TestMultipleNNPortQOP.testMultipleNNPortOverwriteDownStream:177 expected: but was: [ERROR] TestRollingUpgrade.testRollback:354 Test resulted in an unexpected exit [ERROR] TestBalancer.testMaxIterationTime:1649 Unexpected iteration runtime: 4008ms > 3.5s [ERROR] TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock:138 expected:<5> but was:<4> {noformat} > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14958: --- Attachment: HDFS-14958.001.patch Status: Patch Available (was: Open) Attaching patch that fixes the test. > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned HDFS-14958: -- Assignee: Jim Brennan > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
Jim Brennan created HDFS-14958: -- Summary: TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup Key: HDFS-14958 URL: https://issues.apache.org/jira/browse/HDFS-14958 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.3 Reporter: Jim Brennan TestBalancerWithNodeGroup is intended to test with {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14958: --- Priority: Minor (was: Major) > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Priority: Minor > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968458#comment-16968458 ] Jim Brennan commented on HDFS-14958: In the DatanodeManager constructor: {noformat} this.useDfsNetworkTopology = conf.getBoolean( DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY, DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_DEFAULT); if (useDfsNetworkTopology) { networktopology = DFSNetworkTopology.getInstance(conf); } else { networktopology = NetworkTopology.getInstance(conf); } And in DFSNetworkTopology.getInstance(): public static DFSNetworkTopology getInstance(Configuration conf) { DFSNetworkTopology nt = ReflectionUtils.newInstance(conf.getClass( DFSConfigKeys.DFS_NET_TOPOLOGY_IMPL_KEY, DFSConfigKeys.DFS_NET_TOPOLOGY_IMPL_DEFAULT, DFSNetworkTopology.class), conf); return (DFSNetworkTopology) nt.init(DFSTopologyNodeImpl.FACTORY); } {noformat} > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Priority: Minor > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14960) TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology
Jim Brennan created HDFS-14960: -- Summary: TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology Key: HDFS-14960 URL: https://issues.apache.org/jira/browse/HDFS-14960 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.1.3 Reporter: Jim Brennan As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even though it was using DFSNetworkTopology instead of NetworkTopologyWithNodeGroup. [~inigoiri] rightly suggested that this indicates the test is not very good - it should fail when run without NetworkTopologyWithNodeGroup. We should improve this test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968765#comment-16968765 ] Jim Brennan commented on HDFS-14958: [~inigoiri] I filed [HDFS-14960] to improve the test. > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998526#comment-16998526 ] Jim Brennan commented on HDFS-15036: [~shv], [~jhung] was branch-2 actually deleted? I can still see it, and this commit is still there. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983598#comment-16983598 ] Jim Brennan commented on HDFS-14960: [~hemanthboyina] that does seem like a reasonable check to me, and likely would have caught the problem reported in HDFS-14958. I think the intent of this Jira is to improve the test so that it includes some test cases that are unique to NetworkTopologyWithNodeGroup. The fact that it was succeeding when it wasn't using the right class suggests that it could be improved. > TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology > - > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Priority: Minor > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983887#comment-16983887 ] Jim Brennan commented on HDFS-14960: [~ayushtkn] I think the intention was to add a test case that will succeed for NetworkTopologyWithNodeGroup but would fail for DFSNetworkTopology. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Priority: Minor > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[jira] [Commented] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
[ https://issues.apache.org/jira/browse/HDFS-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937131#comment-16937131 ] Jim Brennan commented on HDFS-14858: Thanks [~vagarychen]. Patch 003 looks good to me. > [SBN read] Allow configurably enable/disable AlignmentContext on NameNode > - > > Key: HDFS-14858 > URL: https://issues.apache.org/jira/browse/HDFS-14858 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14858.001.patch, HDFS-14858.002.patch, > HDFS-14858.003.patch > > > As brought up under HDFS-14277, we should make sure SBN read has no > performance impact when it is not enabled. One potential overhead of SBN read > is maintaining and updating additional state status on NameNode. > Specifically, this is done by creating/updating/checking a > {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, > this logic is still be checked. We can make this configurable so that when > SBN read is not enabled, there is no such overhead and everything works as-is. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945992#comment-16945992 ] Jim Brennan commented on HDFS-14893: TestJournalNodeRespectsBindHostKeys unit test failure is unrelated to this change. Please review. > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14893: --- Attachment: HDFS-14893-branch-2.001.patch Status: Patch Available (was: Open) Putting up a patch for branch-2 that just fixes the test. > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947033#comment-16947033 ] Jim Brennan commented on HDFS-14893: [~jhung], [~xkrogen], [~vagarychen], can one of you please review this unit test fix? > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947045#comment-16947045 ] Jim Brennan commented on HDFS-14893: [~vagarychen] that is fine with me. It was not clear to me how soon HDFS-14245 was going to be pulled back to branch-2, so I thought I should put up the unit test fix in the meantime. > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14667) Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943643#comment-16943643 ] Jim Brennan commented on HDFS-14667: [~xkrogen] thanks for working on this! What is the status of this backport to branch-2.10? > Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2 > > > Key: HDFS-14667 > URL: https://issues.apache.org/jira/browse/HDFS-14667 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14403-branch-2.000.patch > > > We would like to target pulling HDFS-14403, an important operability > enhancement, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
Jim Brennan created HDFS-14893: -- Summary: TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2 Key: HDFS-14893 URL: https://issues.apache.org/jira/browse/HDFS-14893 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.10.0 Reporter: Jim Brennan TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing on branch-2 {noformat} [INFO] Running org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA [ERROR] testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) Time elapsed: 0.648 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944812#comment-16944812 ] Jim Brennan commented on HDFS-14893: This is failing on this line: {noformat} assertTrue(logCapture.getOutput().contains("Assuming Standby state")); {noformat} But there is no code that generates that string. Looks like this was caused by HDFS-14785, which changed the logging in getHAServiceState(). It appears to be fixed in trunk by HDFS-14245. [~xkrogen] I don't know if the correct fix is to pull back HDFS-14245 or to just fix this test in branch-2. cc: [~jhung] > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949722#comment-16949722 ] Jim Brennan commented on HDFS-14893: Now that HDFS-14245 is pulled back, I am no longer seeing this on branch-2. > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14893: --- Fix Version/s: 2.10.0 Resolution: Invalid Status: Resolved (was: Patch Available) This fix is no longer needed because HDFS-14245 was pulled back to branch-2. > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Fix For: 2.10.0 > > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949724#comment-16949724 ] Jim Brennan commented on HDFS-14893: This Jira is fixed by HDFS-14245 > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on > branch-2 > -- > > Key: HDFS-14893 > URL: https://issues.apache.org/jira/browse/HDFS-14893 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Priority: Minor > Attachments: HDFS-14893-branch-2.001.patch > > > TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing > on branch-2 > {noformat} > [INFO] Running > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA > [ERROR] > testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA) > Time elapsed: 0.648 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned HDFS-14960: -- Assignee: Jim Brennan > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020289#comment-17020289 ] Jim Brennan commented on HDFS-15125: Looking for someone who can review/commit this. Copying people associated with the jira's I am pulling back. cc: [~linyiqun], [~inigoiri]. [~ayushtkn] > Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 > --- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020360#comment-17020360 ] Jim Brennan commented on HDFS-15125: Thanks [~kihwal]! > Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 > --- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 2.10.1 > > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016078#comment-17016078 ] Jim Brennan commented on HDFS-13339: Thanks [~weichiu]! > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4, 2.9.3, 2.10.1 > > Attachments: HDFS-13339-branch-2.10.001.patch, > HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016134#comment-17016134 ] Jim Brennan commented on HDFS-13339: [~weichiu] did you commit to branches 2.9 and 2.10? > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4, 2.9.3, 2.10.1 > > Attachments: HDFS-13339-branch-2.10.001.patch, > HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016240#comment-17016240 ] Jim Brennan commented on HDFS-13339: Thanks [~weichiu]! I think branch-2 is supposed to be deleted at some point... > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4, 2.9.3, 2.10.1 > > Attachments: HDFS-13339-branch-2.10.001.patch, > HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
Jim Brennan created HDFS-15125: -- Summary: Pull back fixes for DataNodeVolume* unit tests Key: HDFS-15125 URL: https://issues.apache.org/jira/browse/HDFS-15125 Project: Hadoop HDFS Issue Type: Test Components: hdfs Affects Versions: 2.10.0 Reporter: Jim Brennan Assignee: Jim Brennan I would like to pull back some fixes for the DataNodeVolume* tests to resolve some intermittent failures we are seeing on branch-2.10. The fixes are: HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing HDFS-13993 TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is flaky HDFS-14324 Fix TestDataNodeVolumeFailure HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15125: --- Attachment: HDFS-15125-branch-2.10.001.patch Status: Patch Available (was: Open) I am submitting a patch for branch-2.10 that pulls in all of these fixes. Let me know if it would be better to put up individual patches on each of those Jiras. > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016994#comment-17016994 ] Jim Brennan commented on HDFS-15125: The unit tests are unrelated, but I am still seeing one of the tests fail locally with this patch. Once I resolve that, I will upload a new patch. > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018297#comment-17018297 ] Jim Brennan commented on HDFS-15125: Thanks for the review [~ahussein]! It's a good suggestion, but I don't think it is necessary. When it times out, it is going to throw and unless the caller catches it, the test will fail with an error. For example, I temporarily reduced the timeout to force the timeout case, and got this in the output: {noformat} at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) at org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.waitForDiskError(DataNodeTestUtils.java:248) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testTolerateVolumeFailuresAfterAddingMoreVolumes(TestDataNodeVolumeFailure.java:395) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} I'm also reluctant to make additional changes when pulling back fixes from trunk. > Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 > --- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018297#comment-17018297 ] Jim Brennan edited comment on HDFS-15125 at 1/17/20 8:42 PM: - Thanks for the review [~ahussein]! It's a good suggestion, but I don't think it is necessary. When it times out, it is going to throw and unless the caller catches it, the test will fail with an error. For example, I temporarily reduced the timeout to force the timeout case, and got this in the output: {noformat} at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) at org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.waitForDiskError(DataNodeTestUtils.java:248) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testTolerateVolumeFailuresAfterAddingMoreVolumes(TestDataNodeVolumeFailure.java:395) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} I'm also reluctant to make additional changes when pulling back fixes from trunk. was (Author: jim_brennan): Thanks for the review [~ahussein]! It's a good suggestion, but I don't think it is necessary. When it times out, it is going to throw and unless the caller catches it, the test will fail with an error. For example, I temporarily reduced the timeout to force the timeout case, and got this in the output: {noformat} at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) at org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.waitForDiskError(DataNodeTestUtils.java:248) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testTolerateVolumeFailuresAfterAddingMoreVolumes(TestDataNodeVolumeFailure.java:395) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} I'm also reluctant to make additional changes when pulling back fixes from trunk. > Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 > --- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Reopened] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reopened HDFS-13339: Re-opening issue so I can put up a patch for branch-2.10. > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-13339: --- Attachment: HDFS-13339-branch-2.10.001.patch Status: Patch Available (was: Reopened) We have been seeing intermittent test failures on branch-2.10 in TestBlockStatsMXBean. I applied the patch from this Jira and it does seem to resolve the intermittent failures. Can we please pull this back to branch-2.10? I am submitting a patch for it - only change from the original was replacing the lambda in the unit test. > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.0.4, 3.1.1, 3.2.0 > > Attachments: HDFS-13339-branch-2.10.001.patch, HDFS-13339.001.patch, > HDFS-13339.002.patch, HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015254#comment-17015254 ] Jim Brennan commented on HDFS-13339: I submitted another patch to fix the checkstyle issue. I don't believe the unit test failures are due to this Jira. TestJournalNodeRespectsBindHostKeys is failing in qbt builds for 2.10. TestFileCorruption is reported in HDFS-14816 > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: HDFS-13339-branch-2.10.001.patch, > HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-13339: --- Attachment: HDFS-13339-branch-2.10.002.patch > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: HDFS-13339-branch-2.10.001.patch, > HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume
[ https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015419#comment-17015419 ] Jim Brennan commented on HDFS-13339: I don't think the unit test failures are related to this change. They are not failing for me with or without the patch. [~xiaochen], can you please review and if acceptable, commit this to branch-2.10? > Volume reference can't be released and may lead to deadlock when DataXceiver > does a check volume > > > Key: HDFS-13339 > URL: https://issues.apache.org/jira/browse/HDFS-13339 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Environment: os: Linux 2.6.32-358.el6.x86_64 > hadoop version: hadoop-3.2.0-SNAPSHOT > unit: mvn test -Pnative > -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart >Reporter: liaoyuxiangqin >Assignee: Zsolt Venczel >Priority: Critical > Labels: DataNode, volumes > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: HDFS-13339-branch-2.10.001.patch, > HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, > HDFS-13339.003.patch, HDFS-13339.004.patch > > > When i execute Unit Test of > TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, > the process blocks on waitReplication, detail information as follows: > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 307.492 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting > [ERROR] > testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting) > Time elapsed: 307.206 s <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach > 2 replicas > at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15125: --- Attachment: HDFS-15125-branch-2.10.002.patch > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017464#comment-17017464 ] Jim Brennan commented on HDFS-15125: There was a problem with my back-port of HDFS-13945, so TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure was still failing for me intermittently. This is fixed in patch 002. > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15125: --- Summary: Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 (was: Pull back fixes for DataNodeVolume* unit tests) > Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 > --- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018120#comment-17018120 ] Jim Brennan commented on HDFS-15125: I don't believe that any of the failed unit tests are related to these changes, which are limited to different unit tests. I ran them all locally and they all pass for me. I believe this is ready for review. cc: [~kihwal], [~weichiu] > Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10 > --- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus
[ https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013048#comment-17013048 ] Jim Brennan commented on HDFS-15095: {quote} Can you please commit the patch? {quote} I cannot. You'll need a committer for that. cc: [~kihwal], [~jeagles], [~ebadger] > Fix accidental comment in flaky test TestDecommissioningStatus > -- > > Key: HDFS-15095 > URL: https://issues.apache.org/jira/browse/HDFS-15095 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch > > > There are some old Jiras suggesting that "{{testDecommissionStatus"}} is > flaky. > * HDFS-12188 > * HDFS-9599 > * HDFS-9950 > * HDFS-10755 > However, HDFS-14854 fix accidentally commented out one of the checks in > {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will > restore the commented out code and adds a blocking queue to make the test > case deterministic. > My intuition is that monitor task launched by AdminManager may not have > enough time to act before we start verifying the status. I suggest the force > the main thread to block until the node is added to the blocked node. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned HDFS-14205: -- Assignee: Chao Sun (was: Jim Brennan) > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: 2.10.0 > > Attachments: HDFS-14205-branch-2.001.patch, > HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, > HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, > HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, > HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch > > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reassigned HDFS-14205: -- Assignee: Jim Brennan (was: Chao Sun) > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Jim Brennan >Priority: Major > Fix For: 2.10.0 > > Attachments: HDFS-14205-branch-2.001.patch, > HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, > HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, > HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, > HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch > > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus
[ https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010110#comment-17010110 ] Jim Brennan commented on HDFS-15095: Thanks for the patch [~ahussein]! I am +1 (non-binding) on patch 002. Looks good to me! > Fix accidental comment in flaky test TestDecommissioningStatus > -- > > Key: HDFS-15095 > URL: https://issues.apache.org/jira/browse/HDFS-15095 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch > > > There are some old Jiras suggesting that "{{testDecommissionStatus"}} is > flaky. > * HDFS-12188 > * HDFS-9599 > * HDFS-9950 > * HDFS-10755 > However, HDFS-14854 fix accidentally commented out one of the checks in > {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will > restore the commented out code and adds a blocking queue to make the test > case deterministic. > My intuition is that monitor task launched by AdminManager may not have > enough time to act before we start verifying the status. I suggest the force > the main thread to block until the node is added to the blocked node. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus
[ https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010110#comment-17010110 ] Jim Brennan edited comment on HDFS-15095 at 1/7/20 9:41 PM: Thanks for the patch [~ahussein]! I am +1 (non-binding) on patch 002. Looks good to me! was (Author: jim_brennan): Thanks for the patch [~ahussein]! I am +1 (non-binding) on patch 002. Looks good to me! > Fix accidental comment in flaky test TestDecommissioningStatus > -- > > Key: HDFS-15095 > URL: https://issues.apache.org/jira/browse/HDFS-15095 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch > > > There are some old Jiras suggesting that "{{testDecommissionStatus"}} is > flaky. > * HDFS-12188 > * HDFS-9599 > * HDFS-9950 > * HDFS-10755 > However, HDFS-14854 fix accidentally commented out one of the checks in > {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will > restore the commented out code and adds a blocking queue to make the test > case deterministic. > My intuition is that monitor task launched by AdminManager may not have > enough time to act before we start verifying the status. I suggest the force > the main thread to block until the node is added to the blocked node. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057956#comment-17057956 ] Jim Brennan commented on HDFS-15077: Thanks [~iwasakims]! I apologize for not catching the lambda issue - we use java8 internally so it didn't come up when I tried it. I should have tested against apache branch-2.10 instead. > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15077-branch-2.10.patch > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
[ https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059445#comment-17059445 ] Jim Brennan commented on HDFS-11396: Thanks [~inigoiri]! > TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out > - > > Key: HDFS-11396 > URL: https://issues.apache.org/jira/browse/HDFS-11396 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: John Zhuge >Assignee: Ayush Saxena >Priority: Minor > Fix For: 3.2.0, 3.3.0, 2.10.1 > > Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, > patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > > https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057423#comment-17057423 ] Jim Brennan commented on HDFS-15077: [~iwasakims], [~aajisaka] we have seen this failure (rarely) in our automated tests for our internal branch-2.10 build. I believe the patch applies cleanly. Could we get it pulled back to branch-2.10? > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
[ https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-11396: --- Attachment: HDFS-11396-branch-2.10.001.patch > TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out > - > > Key: HDFS-11396 > URL: https://issues.apache.org/jira/browse/HDFS-11396 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: John Zhuge >Assignee: Ayush Saxena >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, > patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > > https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reopened HDFS-10499: Re-opening for branch-2.10 patch supplied by [~ahussein] > TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails > Intermittently > - > > Key: HDFS-10499 > URL: https://issues.apache.org/jira/browse/HDFS-10499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, test >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Yiqun Lin >Priority: Major > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, > HDFS-10499.002.patch > > > Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we > had the following failure. Local rerun is successful. > Stack Trace: > {panel} > java.lang.AssertionError: expected:<17> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113) > {panel} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059097#comment-17059097 ] Jim Brennan commented on HDFS-10499: Thanks for the patch [~ahussein]! I am +1 (non-binding) on this patch for branch-2.10. We have been seeing this test fail (rarely) in our automated builds for branch-2.10. [~linyiqun], [~brahmareddy], [~kihwal] we would appreciate it if someone could review/commit this patch. I will re-open so the precommit build will run. > TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails > Intermittently > - > > Key: HDFS-10499 > URL: https://issues.apache.org/jira/browse/HDFS-10499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, test >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Yiqun Lin >Priority: Major > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, > HDFS-10499.002.patch > > > Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we > had the following failure. Local rerun is successful. > Stack Trace: > {panel} > java.lang.AssertionError: expected:<17> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113) > {panel} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-10499: --- Status: Patch Available (was: Reopened) > TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails > Intermittently > - > > Key: HDFS-10499 > URL: https://issues.apache.org/jira/browse/HDFS-10499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, test >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Yiqun Lin >Priority: Major > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, > HDFS-10499.002.patch > > > Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we > had the following failure. Local rerun is successful. > Stack Trace: > {panel} > java.lang.AssertionError: expected:<17> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113) > {panel} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059102#comment-17059102 ] Jim Brennan commented on HDFS-10499: I also put up a patch for related Jira HDFS-11396. > TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails > Intermittently > - > > Key: HDFS-10499 > URL: https://issues.apache.org/jira/browse/HDFS-10499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, test >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru >Assignee: Yiqun Lin >Priority: Major > Fix For: 3.0.0-alpha2 > > Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, > HDFS-10499.002.patch > > > Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we > had the following failure. Local rerun is successful. > Stack Trace: > {panel} > java.lang.AssertionError: expected:<17> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113) > {panel} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
[ https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan reopened HDFS-11396: Reopening to submit patch for branch-2.10. > TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out > - > > Key: HDFS-11396 > URL: https://issues.apache.org/jira/browse/HDFS-11396 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: John Zhuge >Assignee: Ayush Saxena >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, > patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > > https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
[ https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059100#comment-17059100 ] Jim Brennan commented on HDFS-11396: [~jzhuge], [~goiri], [~kihwal] I have taken the liberty of uploading a patch for branch-2.10. Please let me know if it looks good. > TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out > - > > Key: HDFS-11396 > URL: https://issues.apache.org/jira/browse/HDFS-11396 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: John Zhuge >Assignee: Ayush Saxena >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, > patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > > https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
[ https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-11396: --- Status: Patch Available (was: Reopened) > TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out > - > > Key: HDFS-11396 > URL: https://issues.apache.org/jira/browse/HDFS-11396 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, test >Reporter: John Zhuge >Assignee: Ayush Saxena >Priority: Minor > Fix For: 3.3.0, 3.2.0 > > Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, > patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > > https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11439) testGenerationStampInFuture UT fails
[ https://issues.apache.org/jira/browse/HDFS-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059105#comment-17059105 ] Jim Brennan commented on HDFS-11439: I believe this is fixed by HDFS-11396? > testGenerationStampInFuture UT fails > > > Key: HDFS-11439 > URL: https://issues.apache.org/jira/browse/HDFS-11439 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yesha Vora >Priority: Major > Attachments: testGenerationStampInFuture.log > > > testGenerationStampInFuture UT fails as below. > {code} > Error Message > expected:<18> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<18> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:125){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106444#comment-17106444 ] Jim Brennan commented on HDFS-15038: [~ayushtkn], [~hemanthboyina], [~inigoiri], we are seeing this failure in the branch-2.10 qbt tests, and I have also been able to repro it in branch-2.10 by running the test in a loop. Here's a recent QBT report of this failure: [https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch-2.10-java7-linux-x86/684/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFsck/testFsckListCorruptSnapshotFiles/] Can we get this fix pulled back to branch-2.10? The existing patch doesn't work in 2.10 because of the lambda, so I have attached a new one that fixes that issue. > TestFsck testFsckListCorruptSnapshotFiles is failing in trunk > - > > Key: HDFS-15038 > URL: https://issues.apache.org/jira/browse/HDFS-15038 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15038-branch-2.10.001.patch, HDFS-15038.001.patch, > HDFS-15038.002.patch, HDFS-15038.003.patch > > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/] > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15038: --- Attachment: HDFS-15038-branch-2.10.001.patch > TestFsck testFsckListCorruptSnapshotFiles is failing in trunk > - > > Key: HDFS-15038 > URL: https://issues.apache.org/jira/browse/HDFS-15038 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15038-branch-2.10.001.patch, HDFS-15038.001.patch, > HDFS-15038.002.patch, HDFS-15038.003.patch > > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/] > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108320#comment-17108320 ] Jim Brennan commented on HDFS-14960: Thanks [~ayushtkn]! I will remove the precondition in BlockPlacementPolicyWithNodeGroup and add some additional verification. Note that with the changes I've made, one of the test cases (testBalancerEndInNoMoveProgress) now achieves what we want. It fails if the balancer does not use NetworkTopologyWithNodeGroup. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108419#comment-17108419 ] Jim Brennan commented on HDFS-14960: Patch 002 removed the change in BlockPlacementPolicyWithNodeGroup and adds code to the test to verify block placement after balancing. I also added checks to verify the topology. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: HDFS-14960.002.patch > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: HDFS-14960.003.patch > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108590#comment-17108590 ] Jim Brennan commented on HDFS-14960: I put up patch 003 to address the checkstyle issues. The unit test failure is unrelated. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110315#comment-17110315 ] Jim Brennan commented on HDFS-14960: I'm not sure what happened with the pre-commit build. Can it be restarted? > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: (was: HDFS-14960.003.patch) > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110383#comment-17110383 ] Jim Brennan commented on HDFS-14960: Is there something wrong with trunk qbt builds? I went to check the latest, and the most recent build I see is from May 1: [https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/] > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: HDFS-14960.003.patch > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110359#comment-17110359 ] Jim Brennan commented on HDFS-14960: I re-uploaded patch 003 to hopefully kick off the pre-commit build again. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107684#comment-17107684 ] Jim Brennan commented on HDFS-14960: On further investigation of this, I realized that the balancer does not pay any attention to {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}}. Here are the config settings for TestBalancerWithNodeGroup: {code:java} static Configuration createConf() { Configuration conf = new HdfsConfiguration(); TestBalancer.initConf(conf); conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, DEFAULT_BLOCK_SIZE); conf.setBoolean(DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY, false); conf.set(CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY, NetworkTopologyWithNodeGroup.class.getName()); conf.set(DFSConfigKeys.DFS_BLOCK_REPLICATOR_CLASSNAME_KEY, BlockPlacementPolicyWithNodeGroup.class.getName()); return conf; } {code} Prior to HDFS-14958, we were not setting {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY = false}}, so BlockPlacementPolicyWithNodeGroup was being initialized with a clusterMap of type DFSNetworkTopology. This did not affect this test though, because the balancer ignores that flag. The Balancer only pays attention to {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} which was already set to NetworkTopologyWithNodeGroup. This is why the test never failed - it is specifically testing the results of the balancer. The only reason I found the issue in HDFS-14958 was because we had some internal changes that caused it to fail. But the apache version never actually failed because of HDFS-14958. Given this, I thought I should double-check that the test does fail if the Balancer doesn't use NetworkTopologyWithNodeGroup. So I set it to use NetworkTopology and the test passed! Looking at it more closely, I was surprised in particular that testBalancerEndInNoMoveProgress() was succeeding in this case. I would expect that with NetworkTopology there would be some block moves. But the code to verify that it finishes with no moves seems to allow moves: {code:java} final int r = Balancer.run(namenodes, BalancerParameters.DEFAULT, conf); Assert.assertTrue(r == ExitStatus.SUCCESS.getExitCode() || (r == ExitStatus.NO_MOVE_PROGRESS.getExitCode())); {code} I don't understand why SUCCESS is a valid return for this case. Isn't the point of this test case to verify that no block moves were done? Sure enough, if I change that assert to be more restrictive: {code:java} Assert.assertTrue(r == ExitStatus.NO_MOVE_PROGRESS.getExitCode()); {code} then testBalancerEndInNoMoveProgress() fails when the topology is not {{NetworkTopologyWithNodeGroup}}. With this change in place, however, when I went back to using {{NetworkTopologyWithNodeGroup}} I ran into a new failure. testBalancerWithRackLocality() was failing on the modified assert. I don't see why this test case was using the runBalanceCanFinish() in the first place though. I changed it to just use runBalancer(), and it passes. This seems more correct to me, although I am definitely not an expert in this area of the code. As suggested by [~hemanthboyina] and others, I also added a precondition check to BlockPlacementPolicyWithNodeGroup.initialize() to verify that clusterMap is an instance of NetworkTopologyWithNodeGroup. With this change, all of the test cases in this test fail immediately if you misconfigure it to use DFSNetworkTopology with BlockPlacementPolicyWithNodeGroup. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: HDFS-14960.001.patch Status: Patch Available (was: Open) > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107322#comment-17107322 ] Jim Brennan commented on HDFS-15038: Thanks everyone. The unit test failures and findbugs are unrelated to this patch, so it should be good to go for branch-2.10. > TestFsck testFsckListCorruptSnapshotFiles is failing in trunk > - > > Key: HDFS-15038 > URL: https://issues.apache.org/jira/browse/HDFS-15038 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15038-branch-2.10.001.patch, HDFS-15038.001.patch, > HDFS-15038.002.patch, HDFS-15038.003.patch > > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/] > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112277#comment-17112277 ] Jim Brennan commented on HDFS-13183: Thanks [~hexiaoqiao]. It looks like there are still some failures. One other note: it's possible TestBalancer did not fail because it uses its own copy of doBalance() called runBalancer(). I don't know if it would have failed if it was using Balancer.run() instead. TestBalancerWithNodeGroup uses Balancer.run(), which is why it was affected. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9376) TestSeveralNameNodes fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112223#comment-17112223 ] Jim Brennan commented on HDFS-9376: --- Thanks [~iwasakims]! I figured that was the case. > TestSeveralNameNodes fails occasionally > --- > > Key: HDFS-9376 > URL: https://issues.apache.org/jira/browse/HDFS-9376 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Masatake Iwasaki >Priority: Major > Fix For: 3.0.0-alpha1, 2.10.1 > > Attachments: HDFS-9376.001.patch, HDFS-9376.002.patch > > > TestSeveralNameNodes has been failing in precommit builds. It usually times > out on waiting for the last thread to finish writing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111215#comment-17111215 ] Jim Brennan commented on HDFS-14960: I will investigate and resolve the unit test failure and put up a new patch. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9376) TestSeveralNameNodes fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111269#comment-17111269 ] Jim Brennan commented on HDFS-9376: --- [~cnauroth], [~iwasakims], [~kihwal] I know this is a pretty old Jira, but we have seen this failure come up in our internal branch-2.10 builds. I downloaded the patch and verified that it applies cleanly to branch-2.10, builds and runs. Any chance we could get this pulled back to branch-2.10? > TestSeveralNameNodes fails occasionally > --- > > Key: HDFS-9376 > URL: https://issues.apache.org/jira/browse/HDFS-9376 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Masatake Iwasaki >Priority: Major > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9376.001.patch, HDFS-9376.002.patch > > > TestSeveralNameNodes has been failing in precommit builds. It usually times > out on waiting for the last thread to finish writing. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111578#comment-17111578 ] Jim Brennan commented on HDFS-13183: [~weichiu], [~hexiaoqiao], I believe this change is causing TestBalancerWithNodeGroup to fail: [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/146/testReport/junit/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerEndInNoMoveProgress/] The problem is that Balancer.doBalance() was changed to construct the NameNodeConnectors inside the iteration loop. The counter to track how many iterations we have gone without a move ({{notChangedIterations}}) is in the NameNodeConnector, but it is intended to work across iterations. Since we are now creating new connectors on each iteration, this will always be zero, so we will never exit a balancer with ExitStatus.NO_MOVE_PROGRESS. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111582#comment-17111582 ] Jim Brennan commented on HDFS-14960: I believe this failure is actually due to HDFS-13183. I have added a comment to that Jira: https://issues.apache.org/jira/browse/HDFS-13183?focusedCommentId=17111578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17111578 Would like to make sure that is resolved before fixing this one. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111586#comment-17111586 ] Jim Brennan commented on HDFS-13183: More importantly, because it will never return NO_MOVE_PROGRESS, it will loop forever returning IN_PROGRESS. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113467#comment-17113467 ] Jim Brennan commented on HDFS-13183: I am +1 (non-binding) on the second addendum patch. > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover, namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, > HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, > HDFS-13183.addendum.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12548) HDFS Jenkins build is unstable on branch-2
[ https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190411#comment-17190411 ] Jim Brennan commented on HDFS-12548: I propose we close this issue or at least reduce the priority. It's three years old and I don't see any evidence that we've seen it again. Haven't switched over to cloudbees as well? > HDFS Jenkins build is unstable on branch-2 > -- > > Key: HDFS-12548 > URL: https://issues.apache.org/jira/browse/HDFS-12548 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Affects Versions: 2.9.0 >Reporter: Rushabh Shah >Priority: Critical > > Feel free move the ticket to another project (e.g. infra). > Recently I attached branch-2 patch while working on one jira > [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676] > There were at-least 100 failed and timed out tests. I am sure they are not > related to my patch. > Also I came across another jira which was just a javadoc related change and > there were around 100 failed tests. > Below are the details for pre-commits that failed in branch-2 > 1 [HDFS-12386 attempt > 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069] > {noformat} > Ran on slave: asf912.gq1.ygridcore.net/H12 > Failed with following error message: > Build timed out (after 300 minutes). Marking the build as aborted. > Build was aborted > Performing Post build task... > {noformat} > 2. [HDFS-12386 attempt > 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676] > {noformat} > Ran on slave: asf900.gq1.ygridcore.net > Failed with following error message: > FATAL: command execution failed > Command close created at > at hudson.remoting.Command.(Command.java:60) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1123) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1121) > at hudson.remoting.Channel.close(Channel.java:1281) > at hudson.remoting.Channel.close(Channel.java:1263) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128) > Caused: hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129) > at hudson.remoting.Channel$1.handle(Channel.java:527) > at > hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83) > Caused: java.io.IOException: Backing channel 'H0' is disconnected. > at > hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192) > at > hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257) > at com.sun.proxy.$Proxy125.isAlive(Unknown Source) > at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043) > at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035) > at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155) > at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109) > at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66) > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) > at > hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735) > at hudson.model.Build$BuildExecution.build(Build.java:206) > at hudson.model.Build$BuildExecution.doRun(Build.java:163) > at > hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490) > at hudson.model.Run.execute(Run.java:1735) > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) > at hudson.model.ResourceController.execute(ResourceController.java:97) > at hudson.model.Executor.run(Executor.java:405) > {noformat} > 3. [HDFS-12531 attempt > 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493] > {noformat} > Ran on slave: asf911.gq1.ygridcore.net > Failed with following error message: > FATAL: command execution failed > Command close created at > at hudson.remoting.Command.(Command.java:60) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1123) > at hudson.remoting.Channel$CloseCommand.(Channel.java:1121) > at hudson.remoting.Channel.close(Channel.java:1281) > at hudson.remoting.Channel.close(Channel.java:1263) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128) > Caused: hudson.remoting.Channel$OrderlyShutdown >
[jira] [Commented] (HDFS-14277) [SBN read] Observer benchmark results
[ https://issues.apache.org/jira/browse/HDFS-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192445#comment-17192445 ] Jim Brennan commented on HDFS-14277: [~jhung], you removed the release-blocker label for 2.10.0, but the priority of this Jira is still set to Blocker. I believe the blocking issue was addressed in [HDFS-14822]. Can we change the priority for this Jira to something more appropriate? > [SBN read] Observer benchmark results > - > > Key: HDFS-14277 > URL: https://issues.apache.org/jira/browse/HDFS-14277 > Project: Hadoop HDFS > Issue Type: Task > Components: ha, namenode >Affects Versions: 2.10.0, 3.3.0 > Environment: Hardware: 4-node cluster, each node has 4 core, Xeon > 2.5Ghz, 25GB memory. > Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, > RPC encryption + Data Transfer Encryption, Cloudera Navigator. >Reporter: Wei-Chiu Chuang >Priority: Blocker > Attachments: Observer profiler.png, Screen Shot 2019-02-14 at > 11.50.37 AM.png, observer RPC queue processing time.png > > > Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled > cluster. Would like to share the results with the community. The cluster has > 1 Observer node. > h2. NNThroughputBenchmark > Generate 1 million files and send fileStatus RPCs. > {code:java} > hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs > -op fileStatus -threads 100 -files 100 -useExisting > -keepResults > {code} > h3. Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled: > ||Node||fileStatus (Ops per sec)|| > |Active NameNode|4865| > |Observer|3996| > h3. Kerberos, SSL: > ||Node||fileStatus (Ops per sec)|| > |Active NameNode|7078| > |Observer|6459| > Observation: > * due to the edit tailing overhead, Observer node consume 30% CPU > utilization even if the cluster is idle. > * While Active NN has less than 1ms RPC processing time, Observer node has > > 5ms RPC processing time. I am still looking for the source of the longer > processing time. The longer RPC processing time may be the cause for the > performance degradation compared to that of Active NN. Note the cluster has > Cloudera Navigator installed which adds additional overhead to RPC processing > time. > * {{GlobalStateIdContext#isCoordinatedCall()}} pops up as one of the top > hotspots in the profiler. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: HDFS-14960.004.patch > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch, HDFS-14960.004.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118837#comment-17118837 ] Jim Brennan commented on HDFS-14960: Now that HDFS-13183 has been fixed, I uploaded patch 004 which is the same as patch 003, just rebased to the current trunk. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch, HDFS-14960.004.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119061#comment-17119061 ] Jim Brennan commented on HDFS-14960: Thanks for the review [~inigoiri]! I've addressed all of your comments in patch 005. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch, HDFS-14960.004.patch, HDFS-14960.005.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-14960: --- Attachment: HDFS-14960.005.patch > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch, HDFS-14960.004.patch, HDFS-14960.005.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118995#comment-17118995 ] Jim Brennan commented on HDFS-14960: The failed unit tests are unrelated to this change. > TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology > > > Key: HDFS-14960 > URL: https://issues.apache.org/jira/browse/HDFS-14960 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, > HDFS-14960.003.patch, HDFS-14960.004.patch > > > As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even > though it was using DFSNetworkTopology instead of > NetworkTopologyWithNodeGroup. > [~inigoiri] rightly suggested that this indicates the test is not very good - > it should fail when run without NetworkTopologyWithNodeGroup. > We should improve this test. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org