[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags

2019-04-03 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808843#comment-16808843
 ] 

Jim Brennan commented on HDFS-14394:


I am +1 on this (non-binding).  With gcc 4.8.5 on rhel7 my compilation was 
failing.   This patch fixes it.

> Add -std=c99 / -std=gnu99 to libhdfs compile flags
> --
>
> Key: HDFS-14394
> URL: https://issues.apache.org/jira/browse/HDFS-14394
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs-client, libhdfs, native
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-14394.001.patch
>
>
> libhdfs compilation currently does not enforce a minimum required C version. 
> As of today, the libhdfs build on Hadoop QA works, but when built on a 
> machine with an outdated gcc / cc version where C89 is the default, 
> compilation fails due to errors such as:
> {code}
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  error: ‘for’ loop initial declarations are only allowed in C99 mode
> for (int i = 0; i < numCachedClasses; i++) {
> ^
> /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5:
>  note: use option -std=c99 or -std=gnu99 to compile your code
> {code}
> We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that 
> we can enforce C99 as the minimum required version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report

2019-08-12 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905289#comment-16905289
 ] 

Jim Brennan commented on HDFS-12914:


[~jojochuang], the revert of the commit for branch-2 appears to have broken the 
build:

{noformat}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/jbrennan02/git/apache-hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java:[226,23]
 cannot find symbol
  symbol:   method 
setBlockManagerForTesting(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)
  location: class org.apache.hadoop.hdfs.server.namenode.FSNamesystem
[INFO] 1 error
{noformat}
When I revert this commit, I can build:
{noformat}
commit 585b6de63721f3ea8057677676038a6f8f2c33f5 (HEAD -> branch-2, 
apache-hadoop/branch-2)
Author: Wei-Chiu Chuang 
Date:   Fri Aug 9 16:59:27 2019 -0700

Revert "HDFS-12914. Block report leases cause missing blocks until next 
report. Contributed by Santosh Marella, He Xiaoqiao."

This reverts commit 567e1178d88ccfc258ce2ade4f8af66cc5a4daa7.
{noformat}


> Block report leases cause missing blocks until next report
> --
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.8.0, 2.9.2
>Reporter: Daryn Sharp
>Assignee: Santosh Marella
>Priority: Critical
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch, 
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, 
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, 
> HDFS-12914.009.patch, HDFS-12914.branch-2.000.patch, 
> HDFS-12914.branch-2.001.patch, HDFS-12914.branch-2.002.patch, 
> HDFS-12914.branch-2.8.001.patch, HDFS-12914.branch-2.8.002.patch, 
> HDFS-12914.branch-2.patch, HDFS-12914.branch-3.0.patch, 
> HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, 
> HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for 
> conditions such as "unknown datanode", "not in pending set", "lease has 
> expired", wrong lease id, etc.  Lease rejection does not throw an exception.  
> It returns false which bubbles up to  {{NameNodeRpcServer#blockReport}} and 
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes 
> active with _no blocks_.  A replication storm ensues possibly causing DNs to 
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on 
> re-registration.  The cluster will have many "missing blocks" until the DNs 
> next FBR is sent and/or forced.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14726) Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519

2019-08-30 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919868#comment-16919868
 ] 

Jim Brennan commented on HDFS-14726:


This should be marked as a blocker for 2.10 with the release-blocker label.

cc: [~jhung]

> Fix JN incompatibility issue in branch-2 due to backport of HDFS-10519
> --
>
> Key: HDFS-14726
> URL: https://issues.apache.org/jira/browse/HDFS-14726
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: journal-node
>Affects Versions: 2.10.0
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Blocker
> Attachments: HDFS-14726-branch-2.001.patch, 
> HDFS-14726-branch-2.002.patch, HDFS-14726-branch-2.003.patch
>
>
> HDFS-10519 has been backported to branch-2. However HDFS-10519 introduced an 
> incompatibility issue between NN and JN due to the new protobuf field 
> {{committedTxnId}} in {{HdfsServer.proto}}. This field was introduced as a 
> required field so if JN and NN are not on same version, it will run into 
> missing field exception. Although currently we can get around by making sure 
> JN always gets upgraded properly before NN, we can potentially fix this 
> incompatibility by changing the field to optional. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs

2019-08-06 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901417#comment-16901417
 ] 

Jim Brennan commented on HDFS-12491:


The code looks good, but you should declare all of the new functions static as 
they are only used in jni_helper.c

LibHdfs.md should be updated to remove the warning about wildcards.

 

> Support wildcard in CLASSPATH for libhdfs
> -
>
> Key: HDFS-12491
> URL: https://issues.apache.org/jira/browse/HDFS-12491
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: HDFS-12491.001.patch, testWildCard.sh
>
>
> According to libhdfs doc, wildcard in CLASSPATH is not support:
> bq. The most common problem is the CLASSPATH is not set properly when calling 
> a program that uses libhdfs. Make sure you set it to all the Hadoop jars 
> needed to run Hadoop itself as well as the right configuration directory 
> containing hdfs-site.xml. It is not valid to use wildcard syntax for 
> specifying multiple jars. It may be useful to run hadoop classpath --glob or 
> hadoop classpath --jar  to generate the correct classpath for your 
> deployment.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs

2019-08-06 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901484#comment-16901484
 ] 

Jim Brennan commented on HDFS-12491:


[~samkhan], thanks for updating the patch!

(nit) on patch 002 LibHdfs.md: I don't think you should put a newline between 
the sentences.  That said, I'm not sure we need the new sentence:
 {{Wildcard entries in the `CLASSPATH` are now supported by libhdfs.}}
 I think just removing the statement that it is not supported is sufficient.

I am +1 on the code changes (non-binding)

> Support wildcard in CLASSPATH for libhdfs
> -
>
> Key: HDFS-12491
> URL: https://issues.apache.org/jira/browse/HDFS-12491
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: HDFS-12491.001.patch, HDFS-12491.002.patch, 
> testWildCard.sh
>
>
> According to libhdfs doc, wildcard in CLASSPATH is not support:
> bq. The most common problem is the CLASSPATH is not set properly when calling 
> a program that uses libhdfs. Make sure you set it to all the Hadoop jars 
> needed to run Hadoop itself as well as the right configuration directory 
> containing hdfs-site.xml. It is not valid to use wildcard syntax for 
> specifying multiple jars. It may be useful to run hadoop classpath --glob or 
> hadoop classpath --jar  to generate the correct classpath for your 
> deployment.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs

2019-08-08 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902997#comment-16902997
 ] 

Jim Brennan commented on HDFS-12491:


[~kihwal] can you review this as well?

 

> Support wildcard in CLASSPATH for libhdfs
> -
>
> Key: HDFS-12491
> URL: https://issues.apache.org/jira/browse/HDFS-12491
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 2.8.0
>Reporter: John Zhuge
>Assignee: Muhammad Samir Khan
>Priority: Major
> Attachments: HDFS-12491.001.patch, HDFS-12491.002.patch, 
> testWildCard.sh
>
>
> According to libhdfs doc, wildcard in CLASSPATH is not support:
> bq. The most common problem is the CLASSPATH is not set properly when calling 
> a program that uses libhdfs. Make sure you set it to all the Hadoop jars 
> needed to run Hadoop itself as well as the right configuration directory 
> containing hdfs-site.xml. It is not valid to use wildcard syntax for 
> specifying multiple jars. It may be useful to run hadoop classpath --glob or 
> hadoop classpath --jar  to generate the correct classpath for your 
> deployment.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode

2019-10-01 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942084#comment-16942084
 ] 

Jim Brennan commented on HDFS-14858:


Thanks [~vagarychen]! patch004 looks good to me.  +1 (non-binding).

 

> [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
> -
>
> Key: HDFS-14858
> URL: https://issues.apache.org/jira/browse/HDFS-14858
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14858.001.patch, HDFS-14858.002.patch, 
> HDFS-14858.003.patch, HDFS-14858.004.patch
>
>
> As brought up under HDFS-14277, we should make sure SBN read has no 
> performance impact when it is not enabled. One potential overhead of SBN read 
> is maintaining and updating additional state status on NameNode. 
> Specifically, this is done by creating/updating/checking a 
> {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, 
> this logic is still be checked.  We can make this configurable so that when 
> SBN read is not enabled, there is no such overhead and everything works as-is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode

2019-09-24 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936863#comment-16936863
 ] 

Jim Brennan commented on HDFS-14858:


[~vagarychen], [~jojochuang] As a major new feature for branch-2, I think this 
should default to false (DFS_NAMENODE_STATE_CONTEXT_ENABLED_DEFAULT = false).


> [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
> -
>
> Key: HDFS-14858
> URL: https://issues.apache.org/jira/browse/HDFS-14858
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14858.001.patch, HDFS-14858.002.patch
>
>
> As brought up under HDFS-14277, we should make sure SBN read has no 
> performance impact when it is not enabled. One potential overhead of SBN read 
> is maintaining and updating additional state status on NameNode. 
> Specifically, this is done by creating/updating/checking a 
> {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, 
> this logic is still be checked.  We can make this configurable so that when 
> SBN read is not enabled, there is no such overhead and everything works as-is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969409#comment-16969409
 ] 

Jim Brennan commented on HDFS-14958:


Thanks for the reviews [~ayushtkn] and [~inigoiri]!  Can someone please commit 
this?   I would ideally like it pulled back to branch-2.10 - that is where I 
found the problem - we have some internal changes to 
NetworkTopologyWithNodeGroup so this test was actually failing for us.

cc: [~kihwal]

 

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969439#comment-16969439
 ] 

Jim Brennan commented on HDFS-14958:


Thanks [~ayushtkn]!

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0
>
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968653#comment-16968653
 ] 

Jim Brennan commented on HDFS-14958:


[~inigoiri] I agree.   I think this test pre-dates DFSNetworkTopology, so 
perhaps it did work originally.  I tried changing it to use NetworkTopology and 
one of the test fails in that case, so it does differentiate between 
NetworkTopology and NetworkTopologyWithNodeGroup.  They just all succeed with 
the default DFSNetworkTopology, which is why we missed this.

Should we file another Jira to improve this test?

The unit tests that failed appear to be unrelated to this patch:
{noformat}
[ERROR] Failures: 
[ERROR]   TestMultipleNNPortQOP.testMultipleNNPortOverwriteDownStream:177 
expected: but was:
[ERROR]   TestRollingUpgrade.testRollback:354 Test resulted in an unexpected 
exit
[ERROR]   TestBalancer.testMaxIterationTime:1649 Unexpected iteration runtime: 
4008ms > 3.5s
[ERROR]   TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock:138 
expected:<5> but was:<4>
{noformat}

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14958:
---
Attachment: HDFS-14958.001.patch
Status: Patch Available  (was: Open)

Attaching patch that fixes the test.

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reassigned HDFS-14958:
--

Assignee: Jim Brennan

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)
Jim Brennan created HDFS-14958:
--

 Summary: TestBalancerWithNodeGroup is not using 
NetworkTopologyWithNodeGroup
 Key: HDFS-14958
 URL: https://issues.apache.org/jira/browse/HDFS-14958
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.3
Reporter: Jim Brennan


TestBalancerWithNodeGroup is intended to test with 
{{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  Because 
{{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
{{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the test 
actually uses the default {{DFSNetworkTopology}}.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14958:
---
Priority: Minor  (was: Major)

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Priority: Minor
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968458#comment-16968458
 ] 

Jim Brennan commented on HDFS-14958:


In the DatanodeManager constructor:
{noformat}
this.useDfsNetworkTopology = conf.getBoolean(
DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY,
DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_DEFAULT);
if (useDfsNetworkTopology) {
  networktopology = DFSNetworkTopology.getInstance(conf);
} else {
  networktopology = NetworkTopology.getInstance(conf);
}

And in DFSNetworkTopology.getInstance():

  public static DFSNetworkTopology getInstance(Configuration conf) {

DFSNetworkTopology nt = ReflectionUtils.newInstance(conf.getClass(
DFSConfigKeys.DFS_NET_TOPOLOGY_IMPL_KEY,
DFSConfigKeys.DFS_NET_TOPOLOGY_IMPL_DEFAULT,
DFSNetworkTopology.class), conf);
return (DFSNetworkTopology) nt.init(DFSTopologyNodeImpl.FACTORY);
  }

{noformat}


> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Priority: Minor
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14960) TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2019-11-06 Thread Jim Brennan (Jira)
Jim Brennan created HDFS-14960:
--

 Summary: TesteBalancerWithNodeGroup should not succeed with 
DFSNetworkTopology
 Key: HDFS-14960
 URL: https://issues.apache.org/jira/browse/HDFS-14960
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 3.1.3
Reporter: Jim Brennan


As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even though 
it was using DFSNetworkTopology instead of NetworkTopologyWithNodeGroup.

[~inigoiri] rightly suggested that this indicates the test is not very good - 
it should fail when run without NetworkTopologyWithNodeGroup.

We should improve this test.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-11-06 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968765#comment-16968765
 ] 

Jim Brennan commented on HDFS-14958:


[~inigoiri] I filed [HDFS-14960] to improve the test.

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-17 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998526#comment-16998526
 ] 

Jim Brennan commented on HDFS-15036:


[~shv], [~jhung] was branch-2 actually deleted?    I can still see it, and this 
commit is still there.

 

 

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, 
> HDFS-15036.003.patch
>
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2019-11-27 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983598#comment-16983598
 ] 

Jim Brennan commented on HDFS-14960:


[~hemanthboyina] that does seem like a reasonable check to me, and likely would 
have caught the problem reported in HDFS-14958.   I think the intent of this 
Jira is to improve the test so that it includes some test cases that are unique 
to NetworkTopologyWithNodeGroup.   The fact that it was succeeding when it 
wasn't using the right class suggests that it could be improved.

 

> TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> -
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Priority: Minor
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2019-11-27 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983887#comment-16983887
 ] 

Jim Brennan commented on HDFS-14960:


[~ayushtkn] I think the intention was to add a test case that will succeed for 
NetworkTopologyWithNodeGroup but would fail for DFSNetworkTopology.

 

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Priority: Minor
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2019-11-27 Thread Jim Brennan (Jira)


[jira] [Commented] (HDFS-14858) [SBN read] Allow configurably enable/disable AlignmentContext on NameNode

2019-09-24 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937131#comment-16937131
 ] 

Jim Brennan commented on HDFS-14858:


Thanks [~vagarychen].  Patch 003 looks good to me.


> [SBN read] Allow configurably enable/disable AlignmentContext on NameNode
> -
>
> Key: HDFS-14858
> URL: https://issues.apache.org/jira/browse/HDFS-14858
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14858.001.patch, HDFS-14858.002.patch, 
> HDFS-14858.003.patch
>
>
> As brought up under HDFS-14277, we should make sure SBN read has no 
> performance impact when it is not enabled. One potential overhead of SBN read 
> is maintaining and updating additional state status on NameNode. 
> Specifically, this is done by creating/updating/checking a 
> {{GlobalStateIdContext}} instance. Currently, even without enabling SBN read, 
> this logic is still be checked.  We can make this configurable so that when 
> SBN read is not enabled, there is no such overhead and everything works as-is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945992#comment-16945992
 ] 

Jim Brennan commented on HDFS-14893:


TestJournalNodeRespectsBindHostKeys unit test failure is unrelated to this 
change.  Please review.

 

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-07 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14893:
---
Attachment: HDFS-14893-branch-2.001.patch
Status: Patch Available  (was: Open)

Putting up a patch for branch-2 that just fixes the test.

 

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-08 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947033#comment-16947033
 ] 

Jim Brennan commented on HDFS-14893:


[~jhung], [~xkrogen], [~vagarychen], can one of you please review this unit 
test fix?

 

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-08 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947045#comment-16947045
 ] 

Jim Brennan commented on HDFS-14893:


[~vagarychen] that is fine with me.  It was not clear to me how soon HDFS-14245 
was going to be pulled back to branch-2, so I thought I should put up the unit 
test fix in the meantime.

 

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14667) Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2

2019-10-03 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943643#comment-16943643
 ] 

Jim Brennan commented on HDFS-14667:


[~xkrogen] thanks for working on this! What is the status of this backport to 
branch-2.10?

 

> Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2
> 
>
> Key: HDFS-14667
> URL: https://issues.apache.org/jira/browse/HDFS-14667
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14403-branch-2.000.patch
>
>
> We would like to target pulling HDFS-14403, an important operability 
> enhancement, into branch-2.
> It's only present in trunk now so we also need to backport through the 3.x 
> lines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-04 Thread Jim Brennan (Jira)
Jim Brennan created HDFS-14893:
--

 Summary: 
TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
branch-2
 Key: HDFS-14893
 URL: https://issues.apache.org/jira/browse/HDFS-14893
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.10.0
Reporter: Jim Brennan


TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing on 
branch-2
{noformat}
[INFO] Running 
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 s 
<<< FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
[ERROR] 
testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
  Time elapsed: 0.648 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-04 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944812#comment-16944812
 ] 

Jim Brennan commented on HDFS-14893:


This is failing on this line:
{noformat}
assertTrue(logCapture.getOutput().contains("Assuming Standby state"));
{noformat}
But there is no code that generates that string.  Looks like this was caused by 
HDFS-14785, which changed the logging in getHAServiceState().
It appears to be fixed in trunk by HDFS-14245.
[~xkrogen] I don't know if the correct fix is to pull back HDFS-14245 or to 
just fix this test in branch-2.
cc: [~jhung]


> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-11 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949722#comment-16949722
 ] 

Jim Brennan commented on HDFS-14893:


Now that HDFS-14245 is pulled back, I am no longer seeing this on branch-2.

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-11 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14893:
---
Fix Version/s: 2.10.0
   Resolution: Invalid
   Status: Resolved  (was: Patch Available)

This fix is no longer needed because HDFS-14245 was pulled back to branch-2.

 

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Fix For: 2.10.0
>
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14893) TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on branch-2

2019-10-11 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949724#comment-16949724
 ] 

Jim Brennan commented on HDFS-14893:


This Jira is fixed by HDFS-14245

 

> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT failing on 
> branch-2
> --
>
> Key: HDFS-14893
> URL: https://issues.apache.org/jira/browse/HDFS-14893
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14893-branch-2.001.patch
>
>
> TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT() is failing 
> on branch-2
> {noformat}
> [INFO] Running 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.994 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA
> [ERROR] 
> testObserverReadProxyProviderWithDT(org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA)
>   Time elapsed: 0.648 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:159)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-02-25 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reassigned HDFS-14960:
--

Assignee: Jim Brennan

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10

2020-01-21 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020289#comment-17020289
 ] 

Jim Brennan commented on HDFS-15125:


Looking for someone who can review/commit this.  Copying people associated with 
the jira's I am pulling back.

cc: [~linyiqun], [~inigoiri]. [~ayushtkn]

> Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
> ---
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10

2020-01-21 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020360#comment-17020360
 ] 

Jim Brennan commented on HDFS-15125:


Thanks [~kihwal]!

> Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
> ---
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 2.10.1
>
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016078#comment-17016078
 ] 

Jim Brennan commented on HDFS-13339:


Thanks [~weichiu]!

 

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4, 2.9.3, 2.10.1
>
> Attachments: HDFS-13339-branch-2.10.001.patch, 
> HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016134#comment-17016134
 ] 

Jim Brennan commented on HDFS-13339:


[~weichiu] did you commit to branches 2.9 and 2.10? 

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4, 2.9.3, 2.10.1
>
> Attachments: HDFS-13339-branch-2.10.001.patch, 
> HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016240#comment-17016240
 ] 

Jim Brennan commented on HDFS-13339:


Thanks [~weichiu]!  I think branch-2 is supposed to be deleted at some point...

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4, 2.9.3, 2.10.1
>
> Attachments: HDFS-13339-branch-2.10.001.patch, 
> HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-15 Thread Jim Brennan (Jira)
Jim Brennan created HDFS-15125:
--

 Summary: Pull back fixes for DataNodeVolume* unit tests
 Key: HDFS-15125
 URL: https://issues.apache.org/jira/browse/HDFS-15125
 Project: Hadoop HDFS
  Issue Type: Test
  Components: hdfs
Affects Versions: 2.10.0
Reporter: Jim Brennan
Assignee: Jim Brennan


I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
some intermittent failures we are seeing on branch-2.10.

The fixes are:

HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
HDFS-13993 
TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
flaky
HDFS-14324 Fix TestDataNodeVolumeFailure
HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-15 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-15125:
---
Attachment: HDFS-15125-branch-2.10.001.patch
Status: Patch Available  (was: Open)

I am submitting a patch for branch-2.10 that pulls in all of these fixes.  Let 
me know if it would be better to put up individual patches on each of those 
Jiras.

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016994#comment-17016994
 ] 

Jim Brennan commented on HDFS-15125:


The unit tests are unrelated, but I am still seeing one of the tests fail 
locally with this patch.    Once I resolve that, I will upload a new patch.

 

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10

2020-01-17 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018297#comment-17018297
 ] 

Jim Brennan commented on HDFS-15125:


Thanks for the review [~ahussein]!  It's a good suggestion, but I don't think 
it is necessary.  When it times out, it is going to throw and unless the caller 
catches it, the test will fail with an error.  For example, I temporarily 
reduced the timeout to force the timeout case, and got this in the output:
{noformat}
 at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) 
at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) 
at 
org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.waitForDiskError(DataNodeTestUtils.java:248)
 at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testTolerateVolumeFailuresAfterAddingMoreVolumes(TestDataNodeVolumeFailure.java:395)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}
I'm also reluctant to make additional changes when pulling back fixes from 
trunk.

> Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
> ---
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10

2020-01-17 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018297#comment-17018297
 ] 

Jim Brennan edited comment on HDFS-15125 at 1/17/20 8:42 PM:
-

Thanks for the review [~ahussein]!  It's a good suggestion, but I don't think 
it is necessary.  When it times out, it is going to throw and unless the caller 
catches it, the test will fail with an error.  For example, I temporarily 
reduced the timeout to force the timeout case, and got this in the output:
{noformat}
 at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373)
 at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.waitForDiskError(DataNodeTestUtils.java:248)
 at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testTolerateVolumeFailuresAfterAddingMoreVolumes(TestDataNodeVolumeFailure.java:395)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
 at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}
I'm also reluctant to make additional changes when pulling back fixes from 
trunk.


was (Author: jim_brennan):
Thanks for the review [~ahussein]!  It's a good suggestion, but I don't think 
it is necessary.  When it times out, it is going to throw and unless the caller 
catches it, the test will fail with an error.  For example, I temporarily 
reduced the timeout to force the timeout case, and got this in the output:
{noformat}
 at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) 
at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:373) 
at 
org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.waitForDiskError(DataNodeTestUtils.java:248)
 at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testTolerateVolumeFailuresAfterAddingMoreVolumes(TestDataNodeVolumeFailure.java:395)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}
I'm also reluctant to make additional changes when pulling back fixes from 
trunk.

> Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
> ---
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Reopened] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-14 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reopened HDFS-13339:


Re-opening issue so I can put up a patch for branch-2.10.

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-14 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-13339:
---
Attachment: HDFS-13339-branch-2.10.001.patch
Status: Patch Available  (was: Reopened)

We have been seeing intermittent test failures on branch-2.10 in 
TestBlockStatsMXBean.

I applied the patch from this Jira and it does seem to resolve the intermittent 
failures.

Can we please pull this back to branch-2.10?  I am submitting a patch for it - 
only change from the original was replacing the lambda in the unit test.

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.0.4, 3.1.1, 3.2.0
>
> Attachments: HDFS-13339-branch-2.10.001.patch, HDFS-13339.001.patch, 
> HDFS-13339.002.patch, HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015254#comment-17015254
 ] 

Jim Brennan commented on HDFS-13339:


I submitted another patch to fix the checkstyle issue.  I don't believe the 
unit test failures are due to this Jira.

TestJournalNodeRespectsBindHostKeys is failing in qbt builds for 2.10.

TestFileCorruption is reported in HDFS-14816

 

 

 

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13339-branch-2.10.001.patch, 
> HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-14 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-13339:
---
Attachment: HDFS-13339-branch-2.10.002.patch

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13339-branch-2.10.001.patch, 
> HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13339) Volume reference can't be released and may lead to deadlock when DataXceiver does a check volume

2020-01-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015419#comment-17015419
 ] 

Jim Brennan commented on HDFS-13339:


I don't think the unit test failures are related to this change.  They are not 
failing for me with or without the patch.

[~xiaochen], can you please review and if acceptable, commit this to 
branch-2.10?

 

> Volume reference can't be released and may lead to deadlock when DataXceiver 
> does a check volume
> 
>
> Key: HDFS-13339
> URL: https://issues.apache.org/jira/browse/HDFS-13339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: os: Linux 2.6.32-358.el6.x86_64
> hadoop version: hadoop-3.2.0-SNAPSHOT
> unit: mvn test -Pnative 
> -Dtest=TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart
>Reporter: liaoyuxiangqin
>Assignee: Zsolt Venczel
>Priority: Critical
>  Labels: DataNode, volumes
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HDFS-13339-branch-2.10.001.patch, 
> HDFS-13339-branch-2.10.002.patch, HDFS-13339.001.patch, HDFS-13339.002.patch, 
> HDFS-13339.003.patch, HDFS-13339.004.patch
>
>
> When i execute Unit Test of
>  TestDataNodeVolumeFailureReporting#testVolFailureStatsPreservedOnNNRestart, 
> the process blocks on waitReplication, detail information as follows:
> [INFO] ---
>  [INFO] T E S T S
>  [INFO] ---
>  [INFO] Running 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 307.492 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
>  [ERROR] 
> testVolFailureStatsPreservedOnNNRestart(org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting)
>  Time elapsed: 307.206 s <<< ERROR!
>  java.util.concurrent.TimeoutException: Timed out waiting for /test1 to reach 
> 2 replicas
>  at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:800)
>  at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting.testVolFailureStatsPreservedOnNNRestart(TestDataNodeVolumeFailureReporting.java:283)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-15125:
---
Attachment: HDFS-15125-branch-2.10.002.patch

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017464#comment-17017464
 ] 

Jim Brennan commented on HDFS-15125:


There was a problem with my back-port of HDFS-13945, so 
TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure was still failing 
for me intermittently.

This is fixed in patch 002.

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10

2020-01-17 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-15125:
---
Summary: Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to 
branch-2.10  (was: Pull back fixes for DataNodeVolume* unit tests)

> Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
> ---
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10

2020-01-17 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018120#comment-17018120
 ] 

Jim Brennan commented on HDFS-15125:


I don't believe that any of the failed unit tests are related to these changes, 
which are limited to different unit tests.  I ran them all locally and they all 
pass for me.

I believe this is ready for review.

cc: [~kihwal],  [~weichiu]

> Pull back HDFS-11353, HDFS-13993, HDFS-13945, and HDFS-14324 to branch-2.10
> ---
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus

2020-01-10 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013048#comment-17013048
 ] 

Jim Brennan commented on HDFS-15095:


{quote}
Can you please commit the patch?
{quote}
I cannot.  You'll need a committer for that. 
cc: [~kihwal], [~jeagles], [~ebadger]

> Fix accidental comment in flaky test TestDecommissioningStatus
> --
>
> Key: HDFS-15095
> URL: https://issues.apache.org/jira/browse/HDFS-15095
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch
>
>
> There are some old Jiras suggesting that "{{testDecommissionStatus"}} is 
> flaky.
>  * HDFS-12188
>  * HDFS-9599
>  * HDFS-9950
>  * HDFS-10755
> However, HDFS-14854 fix accidentally commented out one of the checks in 
> {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will 
> restore the commented out code and adds a blocking queue to make the test 
> case deterministic.
> My intuition is that monitor task launched by AdminManager may not have 
> enough time to act before we start verifying the status. I suggest the force 
> the main thread to block until the node is added to the blocked node.
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14205) Backport HDFS-6440 to branch-2

2020-01-03 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reassigned HDFS-14205:
--

Assignee: Chao Sun  (was: Jim Brennan)

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14205) Backport HDFS-6440 to branch-2

2020-01-03 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reassigned HDFS-14205:
--

Assignee: Jim Brennan  (was: Chao Sun)

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus

2020-01-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010110#comment-17010110
 ] 

Jim Brennan commented on HDFS-15095:


Thanks for the patch [~ahussein]!  I am +1 (non-binding) on patch 002.   Looks 
good to me!

> Fix accidental comment in flaky test TestDecommissioningStatus
> --
>
> Key: HDFS-15095
> URL: https://issues.apache.org/jira/browse/HDFS-15095
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch
>
>
> There are some old Jiras suggesting that "{{testDecommissionStatus"}} is 
> flaky.
>  * HDFS-12188
>  * HDFS-9599
>  * HDFS-9950
>  * HDFS-10755
> However, HDFS-14854 fix accidentally commented out one of the checks in 
> {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will 
> restore the commented out code and adds a blocking queue to make the test 
> case deterministic.
> My intuition is that monitor task launched by AdminManager may not have 
> enough time to act before we start verifying the status. I suggest the force 
> the main thread to block until the node is added to the blocked node.
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus

2020-01-07 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010110#comment-17010110
 ] 

Jim Brennan edited comment on HDFS-15095 at 1/7/20 9:41 PM:


Thanks for the patch [~ahussein]!

I am +1 (non-binding) on patch 002.   Looks good to me!


was (Author: jim_brennan):
Thanks for the patch [~ahussein]!  I am +1 (non-binding) on patch 002.   Looks 
good to me!

> Fix accidental comment in flaky test TestDecommissioningStatus
> --
>
> Key: HDFS-15095
> URL: https://issues.apache.org/jira/browse/HDFS-15095
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch
>
>
> There are some old Jiras suggesting that "{{testDecommissionStatus"}} is 
> flaky.
>  * HDFS-12188
>  * HDFS-9599
>  * HDFS-9950
>  * HDFS-10755
> However, HDFS-14854 fix accidentally commented out one of the checks in 
> {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will 
> restore the commented out code and adds a blocking queue to make the test 
> case deterministic.
> My intuition is that monitor task launched by AdminManager may not have 
> enough time to act before we start verifying the status. I suggest the force 
> the main thread to block until the node is added to the blocked node.
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout

2020-03-12 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057956#comment-17057956
 ] 

Jim Brennan commented on HDFS-15077:


Thanks [~iwasakims]!  I apologize for not catching the lambda issue - we use 
java8 internally so it didn't come up when I tried it.  I should have tested 
against apache branch-2.10 instead.


> Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
> 
>
> Key: HDFS-15077
> URL: https://issues.apache.org/jira/browse/HDFS-15077
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-15077-branch-2.10.patch
>
>
> {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due 
> to race between test thread and LeaseRenewer thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out

2020-03-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059445#comment-17059445
 ] 

Jim Brennan commented on HDFS-11396:


Thanks [~inigoiri]!

> TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
> -
>
> Key: HDFS-11396
> URL: https://issues.apache.org/jira/browse/HDFS-11396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: John Zhuge
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.2.0, 3.3.0, 2.10.1
>
> Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, 
> patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout

2020-03-11 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057423#comment-17057423
 ] 

Jim Brennan commented on HDFS-15077:


[~iwasakims], [~aajisaka] we have seen this failure (rarely) in our automated 
tests for our internal branch-2.10 build.  I believe the patch applies cleanly. 
 Could we get it pulled back to branch-2.10?


> Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
> 
>
> Key: HDFS-15077
> URL: https://issues.apache.org/jira/browse/HDFS-15077
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due 
> to race between test thread and LeaseRenewer thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out

2020-03-13 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-11396:
---
Attachment: HDFS-11396-branch-2.10.001.patch

> TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
> -
>
> Key: HDFS-11396
> URL: https://issues.apache.org/jira/browse/HDFS-11396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: John Zhuge
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
> Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, 
> patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently

2020-03-13 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reopened HDFS-10499:


Re-opening for branch-2.10 patch supplied by [~ahussein]

> TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails 
> Intermittently 
> -
>
> Key: HDFS-10499
> URL: https://issues.apache.org/jira/browse/HDFS-10499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, 
> HDFS-10499.002.patch
>
>
> Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we 
> had the following failure. Local rerun is successful.
> Stack Trace:
> {panel}
> java.lang.AssertionError: expected:<17> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113)
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently

2020-03-13 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059097#comment-17059097
 ] 

Jim Brennan commented on HDFS-10499:


Thanks for the patch [~ahussein]!  I am +1 (non-binding) on this patch for 
branch-2.10.
We have been seeing this test fail (rarely) in our automated builds for 
branch-2.10.
[~linyiqun], [~brahmareddy], [~kihwal] we would appreciate it if someone could 
review/commit this patch.
I will re-open so the precommit build will run.


> TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails 
> Intermittently 
> -
>
> Key: HDFS-10499
> URL: https://issues.apache.org/jira/browse/HDFS-10499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, 
> HDFS-10499.002.patch
>
>
> Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we 
> had the following failure. Local rerun is successful.
> Stack Trace:
> {panel}
> java.lang.AssertionError: expected:<17> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113)
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently

2020-03-13 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-10499:
---
Status: Patch Available  (was: Reopened)

> TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails 
> Intermittently 
> -
>
> Key: HDFS-10499
> URL: https://issues.apache.org/jira/browse/HDFS-10499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, 
> HDFS-10499.002.patch
>
>
> Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we 
> had the following failure. Local rerun is successful.
> Stack Trace:
> {panel}
> java.lang.AssertionError: expected:<17> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113)
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10499) TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails Intermittently

2020-03-13 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059102#comment-17059102
 ] 

Jim Brennan commented on HDFS-10499:


I also put up a patch for related Jira HDFS-11396.

> TestNameNodeMetadataConsistency#testGenerationStampInFuture Fails 
> Intermittently 
> -
>
> Key: HDFS-10499
> URL: https://issues.apache.org/jira/browse/HDFS-10499
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Affects Versions: 3.0.0-alpha1
>Reporter: Hanisha Koneru
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-10499-branch-2.10.001.patch, HDFS-10499.001.patch, 
> HDFS-10499.002.patch
>
>
> Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we 
> had the following failure. Local rerun is successful.
> Stack Trace:
> {panel}
> java.lang.AssertionError: expected:<17> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:113)
> {panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out

2020-03-13 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reopened HDFS-11396:


Reopening to submit patch for branch-2.10.

> TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
> -
>
> Key: HDFS-11396
> URL: https://issues.apache.org/jira/browse/HDFS-11396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: John Zhuge
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
> Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, 
> patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out

2020-03-13 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059100#comment-17059100
 ] 

Jim Brennan commented on HDFS-11396:


[~jzhuge], [~goiri], [~kihwal] I have taken the liberty of uploading a patch 
for branch-2.10.  Please let me know if it looks good.

> TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
> -
>
> Key: HDFS-11396
> URL: https://issues.apache.org/jira/browse/HDFS-11396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: John Zhuge
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
> Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, 
> patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11396) TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out

2020-03-13 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-11396:
---
Status: Patch Available  (was: Reopened)

> TestNameNodeMetadataConsistency#testGenerationStampInFuture timed out
> -
>
> Key: HDFS-11396
> URL: https://issues.apache.org/jira/browse/HDFS-11396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, test
>Reporter: John Zhuge
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.3.0, 3.2.0
>
> Attachments: HDFS-11396-01.patch, HDFS-11396-branch-2.10.001.patch, 
> patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/18334/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11439) testGenerationStampInFuture UT fails

2020-03-13 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059105#comment-17059105
 ] 

Jim Brennan commented on HDFS-11439:


I believe this is fixed by HDFS-11396?

> testGenerationStampInFuture UT fails
> 
>
> Key: HDFS-11439
> URL: https://issues.apache.org/jira/browse/HDFS-11439
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Major
> Attachments: testGenerationStampInFuture.log
>
>
> testGenerationStampInFuture UT fails as below.
> {code}
> Error Message
> expected:<18> but was:<0>
> Stacktrace
> java.lang.AssertionError: expected:<18> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:125){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk

2020-05-13 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106444#comment-17106444
 ] 

Jim Brennan commented on HDFS-15038:


[~ayushtkn], [~hemanthboyina], [~inigoiri], we are seeing this failure in the 
branch-2.10 qbt tests, and I have also been able to repro it in branch-2.10 by 
running the test in a loop.

Here's a recent QBT report of this failure: 
[https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch-2.10-java7-linux-x86/684/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFsck/testFsckListCorruptSnapshotFiles/]

Can we get this fix pulled back to branch-2.10?  The existing patch doesn't 
work in 2.10 because of the lambda, so I have attached a new one that fixes 
that issue.

 

> TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
> -
>
> Key: HDFS-15038
> URL: https://issues.apache.org/jira/browse/HDFS-15038
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15038-branch-2.10.001.patch, HDFS-15038.001.patch, 
> HDFS-15038.002.patch, HDFS-15038.003.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/]
>  
> [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk

2020-05-13 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-15038:
---
Attachment: HDFS-15038-branch-2.10.001.patch

> TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
> -
>
> Key: HDFS-15038
> URL: https://issues.apache.org/jira/browse/HDFS-15038
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15038-branch-2.10.001.patch, HDFS-15038.001.patch, 
> HDFS-15038.002.patch, HDFS-15038.003.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/]
>  
> [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108320#comment-17108320
 ] 

Jim Brennan commented on HDFS-14960:


Thanks [~ayushtkn]!  I will remove the precondition in 
BlockPlacementPolicyWithNodeGroup and add some additional verification.  Note 
that with the changes I've made, one of the test cases 
(testBalancerEndInNoMoveProgress) now achieves what we want.  It fails if the 
balancer does not use NetworkTopologyWithNodeGroup.

 

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108419#comment-17108419
 ] 

Jim Brennan commented on HDFS-14960:


Patch 002 removed the change in BlockPlacementPolicyWithNodeGroup and adds code 
to the test to verify block placement after balancing.  I also added checks to 
verify the topology.

 

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-15 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: HDFS-14960.002.patch

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-15 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: HDFS-14960.003.patch

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-15 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108590#comment-17108590
 ] 

Jim Brennan commented on HDFS-14960:


I put up patch 003 to address the checkstyle issues.  The unit test failure is 
unrelated.

 

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-18 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110315#comment-17110315
 ] 

Jim Brennan commented on HDFS-14960:


I'm not sure what happened with the pre-commit build.  Can it be restarted?

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-18 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: (was: HDFS-14960.003.patch)

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-18 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110383#comment-17110383
 ] 

Jim Brennan commented on HDFS-14960:


Is there something wrong with trunk qbt builds?  I went to check the latest, 
and the most recent build I see is from May 1: 
[https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/]

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-18 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: HDFS-14960.003.patch

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-18 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110359#comment-17110359
 ] 

Jim Brennan commented on HDFS-14960:


I re-uploaded patch 003 to hopefully kick off the pre-commit build again.

 

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107684#comment-17107684
 ] 

Jim Brennan commented on HDFS-14960:


On further investigation of this, I realized that the balancer does not pay any 
attention to {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}}.

Here are the config settings for TestBalancerWithNodeGroup:
{code:java}
  static Configuration createConf() {
Configuration conf = new HdfsConfiguration();
TestBalancer.initConf(conf);
conf.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, DEFAULT_BLOCK_SIZE);
conf.setBoolean(DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY, false);
conf.set(CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY,
NetworkTopologyWithNodeGroup.class.getName());
conf.set(DFSConfigKeys.DFS_BLOCK_REPLICATOR_CLASSNAME_KEY, 
BlockPlacementPolicyWithNodeGroup.class.getName());
return conf;
  }
{code}
Prior to HDFS-14958, we were not setting 
{{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY = false}}, so 
BlockPlacementPolicyWithNodeGroup was being initialized with a clusterMap of 
type DFSNetworkTopology. This did not affect this test though, because the 
balancer ignores that flag.  The Balancer only pays attention to 
{{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} which was already set 
to NetworkTopologyWithNodeGroup.

This is why the test never failed - it is specifically testing the results of 
the balancer. The only reason I found the issue in HDFS-14958 was because we 
had some internal changes that caused it to fail. But the apache version never 
actually failed because of HDFS-14958.

Given this, I thought I should double-check that the test does fail if the 
Balancer doesn't use NetworkTopologyWithNodeGroup. So I set it to use 
NetworkTopology and the test passed!

Looking at it more closely, I was surprised in particular that 
testBalancerEndInNoMoveProgress() was succeeding in this case. I would expect 
that with NetworkTopology there would be some block moves. But the code to 
verify that it finishes with no moves seems to allow moves:
{code:java}
final int r = Balancer.run(namenodes, BalancerParameters.DEFAULT, conf);
Assert.assertTrue(r == ExitStatus.SUCCESS.getExitCode() ||
(r == ExitStatus.NO_MOVE_PROGRESS.getExitCode()));
{code}
I don't understand why SUCCESS is a valid return for this case. Isn't the point 
of this test case to verify that no block moves were done?

Sure enough, if I change that assert to be more restrictive:
{code:java}
Assert.assertTrue(r == ExitStatus.NO_MOVE_PROGRESS.getExitCode());
{code}
then testBalancerEndInNoMoveProgress() fails when the topology is not 
{{NetworkTopologyWithNodeGroup}}.

With this change in place, however, when I went back to using 
{{NetworkTopologyWithNodeGroup}} I ran into a new failure. 
testBalancerWithRackLocality() was failing on the modified assert. I don't see 
why this test case was using the runBalanceCanFinish() in the first place 
though. I changed it to just use runBalancer(), and it passes.  This seems more 
correct to me, although I am definitely not an expert in this area of the code.

As suggested by [~hemanthboyina] and others, I also added a precondition check 
to BlockPlacementPolicyWithNodeGroup.initialize() to verify that clusterMap is 
an instance of NetworkTopologyWithNodeGroup.   With this change, all of the 
test cases in this test fail immediately if you misconfigure it to use 
DFSNetworkTopology with BlockPlacementPolicyWithNodeGroup.

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-14 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: HDFS-14960.001.patch
Status: Patch Available  (was: Open)

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk

2020-05-14 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107322#comment-17107322
 ] 

Jim Brennan commented on HDFS-15038:


Thanks everyone. The unit test failures and findbugs are unrelated to this 
patch, so it should be good to go for branch-2.10.

 

> TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
> -
>
> Key: HDFS-15038
> URL: https://issues.apache.org/jira/browse/HDFS-15038
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15038-branch-2.10.001.patch, HDFS-15038.001.patch, 
> HDFS-15038.002.patch, HDFS-15038.003.patch
>
>
> [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/]
>  
> [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-20 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112277#comment-17112277
 ] 

Jim Brennan commented on HDFS-13183:


Thanks [~hexiaoqiao].  It looks like there are still some failures.
One other note: it's possible TestBalancer did not fail because it uses its own 
copy of doBalance() called runBalancer().  I don't know if it would have failed 
if it was using Balancer.run() instead.  TestBalancerWithNodeGroup uses 
Balancer.run(), which is why it was affected.

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9376) TestSeveralNameNodes fails occasionally

2020-05-20 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112223#comment-17112223
 ] 

Jim Brennan commented on HDFS-9376:
---

Thanks [~iwasakims]!  I figured that was the case.

> TestSeveralNameNodes fails occasionally
> ---
>
> Key: HDFS-9376
> URL: https://issues.apache.org/jira/browse/HDFS-9376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.0.0-alpha1, 2.10.1
>
> Attachments: HDFS-9376.001.patch, HDFS-9376.002.patch
>
>
> TestSeveralNameNodes has been failing in precommit builds.  It usually times 
> out on waiting for the last thread to finish writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-19 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111215#comment-17111215
 ] 

Jim Brennan commented on HDFS-14960:


I will investigate and resolve the unit test failure and put up a new patch.

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9376) TestSeveralNameNodes fails occasionally

2020-05-19 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111269#comment-17111269
 ] 

Jim Brennan commented on HDFS-9376:
---

[~cnauroth], [~iwasakims], [~kihwal] I know this is a pretty old Jira, but we 
have seen this failure come up in our internal branch-2.10 builds.  I 
downloaded the patch and verified that it applies cleanly to branch-2.10, 
builds and runs.

Any chance we could get this pulled back to branch-2.10?

> TestSeveralNameNodes fails occasionally
> ---
>
> Key: HDFS-9376
> URL: https://issues.apache.org/jira/browse/HDFS-9376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-9376.001.patch, HDFS-9376.002.patch
>
>
> TestSeveralNameNodes has been failing in precommit builds.  It usually times 
> out on waiting for the last thread to finish writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-19 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111578#comment-17111578
 ] 

Jim Brennan commented on HDFS-13183:


[~weichiu], [~hexiaoqiao], I believe this change is causing 
TestBalancerWithNodeGroup to fail: 
[https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/146/testReport/junit/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerEndInNoMoveProgress/]

The problem is that Balancer.doBalance() was changed to construct the 
NameNodeConnectors inside the iteration loop.   The counter to track how many 
iterations we have gone without a move ({{notChangedIterations}}) is in the 
NameNodeConnector, but it is intended to work across iterations.  Since we are 
now creating new connectors on each iteration, this will always be zero, so we 
will never exit a balancer with ExitStatus.NO_MOVE_PROGRESS.

 

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-19 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111582#comment-17111582
 ] 

Jim Brennan commented on HDFS-14960:


I believe this failure is actually due to HDFS-13183.  I have added a comment 
to that Jira: 
 
https://issues.apache.org/jira/browse/HDFS-13183?focusedCommentId=17111578=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17111578

Would like to make sure that is resolved before fixing this one.

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-19 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111586#comment-17111586
 ] 

Jim Brennan commented on HDFS-13183:


More importantly, because it will never return NO_MOVE_PROGRESS, it will loop 
forever returning IN_PROGRESS.


> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-21 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113467#comment-17113467
 ] 

Jim Brennan commented on HDFS-13183:


I am +1 (non-binding) on the second addendum patch.

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch, 
> HDFS-13183.006.patch, HDFS-13183.007.patch, HDFS-13183.addendum.patch, 
> HDFS-13183.addendum.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12548) HDFS Jenkins build is unstable on branch-2

2020-09-03 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190411#comment-17190411
 ] 

Jim Brennan commented on HDFS-12548:


I propose we close this issue or at least reduce the priority.  It's three 
years old and I don't see any evidence that we've seen it again.  Haven't 
switched over to cloudbees as well?


> HDFS Jenkins build is unstable on branch-2
> --
>
> Key: HDFS-12548
> URL: https://issues.apache.org/jira/browse/HDFS-12548
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.9.0
>Reporter: Rushabh Shah
>Priority: Critical
>
> Feel free move the ticket to another project (e.g. infra).
> Recently I attached branch-2 patch while working on one jira 
> [HDFS-12386|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> There were at-least 100 failed and timed out tests. I am sure they are not 
> related to my patch.
> Also I came across another jira which was just a javadoc related change and 
> there were around 100 failed tests.
> Below are the details for pre-commits that failed in branch-2
> 1 [HDFS-12386 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180069=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180069]
> {noformat}
> Ran on slave: asf912.gq1.ygridcore.net/H12
> Failed with following error message:
> Build timed out (after 300 minutes). Marking the build as aborted.
> Build was aborted
> Performing Post build task...
> {noformat}
> 2. [HDFS-12386 attempt 
> 2|https://issues.apache.org/jira/browse/HDFS-12386?focusedCommentId=16180676=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16180676]
> {noformat}
> Ran on slave: asf900.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1129)
>   at hudson.remoting.Channel$1.handle(Channel.java:527)
>   at 
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:83)
> Caused: java.io.IOException: Backing channel 'H0' is disconnected.
>   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:192)
>   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
>   at com.sun.proxy.$Proxy125.isAlive(Unknown Source)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1043)
>   at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1035)
>   at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
>   at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
>   at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:735)
>   at hudson.model.Build$BuildExecution.build(Build.java:206)
>   at hudson.model.Build$BuildExecution.doRun(Build.java:163)
>   at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:490)
>   at hudson.model.Run.execute(Run.java:1735)
>   at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>   at hudson.model.ResourceController.execute(ResourceController.java:97)
>   at hudson.model.Executor.run(Executor.java:405)
> {noformat}
> 3. [HDFS-12531 attempt 
> 1|https://issues.apache.org/jira/browse/HDFS-12531?focusedCommentId=16176493=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16176493]
> {noformat}
> Ran on slave:  asf911.gq1.ygridcore.net
> Failed with following error message:
> FATAL: command execution failed
> Command close created at
>   at hudson.remoting.Command.(Command.java:60)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1123)
>   at hudson.remoting.Channel$CloseCommand.(Channel.java:1121)
>   at hudson.remoting.Channel.close(Channel.java:1281)
>   at hudson.remoting.Channel.close(Channel.java:1263)
>   at hudson.remoting.Channel$CloseCommand.execute(Channel.java:1128)
> Caused: hudson.remoting.Channel$OrderlyShutdown
>   

[jira] [Commented] (HDFS-14277) [SBN read] Observer benchmark results

2020-09-08 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192445#comment-17192445
 ] 

Jim Brennan commented on HDFS-14277:


[~jhung], you removed the release-blocker label for 2.10.0, but the priority of 
this Jira is still set to Blocker.  I believe the blocking issue was addressed 
in [HDFS-14822].  Can we change the priority for this Jira to something more 
appropriate?

 

> [SBN read] Observer benchmark results
> -
>
> Key: HDFS-14277
> URL: https://issues.apache.org/jira/browse/HDFS-14277
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: ha, namenode
>Affects Versions: 2.10.0, 3.3.0
> Environment: Hardware: 4-node cluster, each node has 4 core, Xeon 
> 2.5Ghz, 25GB memory.
> Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, 
> RPC encryption + Data Transfer Encryption, Cloudera Navigator.
>Reporter: Wei-Chiu Chuang
>Priority: Blocker
> Attachments: Observer profiler.png, Screen Shot 2019-02-14 at 
> 11.50.37 AM.png, observer RPC queue processing time.png
>
>
> Ran a few benchmarks and profiler (VisualVM) today on an Observer-enabled 
> cluster. Would like to share the results with the community. The cluster has 
> 1 Observer node.
> h2. NNThroughputBenchmark
> Generate 1 million files and send fileStatus RPCs.
> {code:java}
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
>   -op fileStatus -threads 100 -files 100 -useExisting 
> -keepResults
> {code}
> h3. Kerberos, SSL, RPC encryption, Data Transfer Encryption enabled:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|4865|
> |Observer|3996|
> h3. Kerberos, SSL:
> ||Node||fileStatus (Ops per sec)||
> |Active NameNode|7078|
> |Observer|6459|
> Observation:
>  * due to the edit tailing overhead, Observer node consume 30% CPU 
> utilization even if the cluster is idle.
>  * While Active NN has less than 1ms RPC processing time, Observer node has > 
> 5ms RPC processing time. I am still looking for the source of the longer 
> processing time. The longer RPC processing time may be the cause for the 
> performance degradation compared to that of Active NN. Note the cluster has 
> Cloudera Navigator installed which adds additional overhead to RPC processing 
> time.
>  * {{GlobalStateIdContext#isCoordinatedCall()}} pops up as one of the top 
> hotspots in the profiler. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-28 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: HDFS-14960.004.patch

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch, HDFS-14960.004.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-28 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118837#comment-17118837
 ] 

Jim Brennan commented on HDFS-14960:


Now that HDFS-13183 has been fixed, I uploaded patch 004 which is the same as 
patch 003, just rebased to the current trunk.


> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch, HDFS-14960.004.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-28 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119061#comment-17119061
 ] 

Jim Brennan commented on HDFS-14960:


Thanks for the review [~inigoiri]!  I've addressed all of your comments in 
patch 005.


> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch, HDFS-14960.004.patch, HDFS-14960.005.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-28 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-14960:
---
Attachment: HDFS-14960.005.patch

> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch, HDFS-14960.004.patch, HDFS-14960.005.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2020-05-28 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118995#comment-17118995
 ] 

Jim Brennan commented on HDFS-14960:


The failed unit tests are unrelated to this change.


> TestBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> 
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-14960.001.patch, HDFS-14960.002.patch, 
> HDFS-14960.003.patch, HDFS-14960.004.patch
>
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >