[jira] [Commented] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.

2020-01-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027263#comment-17027263
 ] 

Ayush Saxena commented on HDFS-15127:
-

[~elgoiri] plans updating? just removing the directory part of test, should be 
well enough

> RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount 
> points.
> 
>
> Key: HDFS-15127
> URL: https://issues.apache.org/jira/browse/HDFS-15127
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15127.000.patch, HDFS-15127.001.patch, 
> HDFS-15127.002.patch
>
>
> A HASH_ALL mount point should not allow creating new files if one subcluster 
> is down.
> If the file already existed in the past, this could lead to inconsistencies.
> We should return an unavailable exception.
> {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be 
> changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027262#comment-17027262
 ] 

Ayush Saxena commented on HDFS-15115:
-

Thanx Everyone for the work here?

Well there are two approaches possible here first is the old one, having null 
check and one in the v2 patch that is initializing builder irrespective of log 
level. Well personally I prefer having the previous approach of null check, but 
I am ok doing this way too, if everyone prefers this.

[~weichiu] [~hexiaoqiao] any preferences???

Anyway [~belugabehr] [~wzx513] possible extending a UT? Not sure must be tricky 
to change the log level in middle, just give a check once.

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Assignee: David Mollitor
>Priority: Major
> Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: (was: HDFS-15124.004.patch)

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
>       } 

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027151#comment-17027151
 ] 

Hadoop QA commented on HDFS-15124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-15124 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992292/HDFS-15124.004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28727/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try 

[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: HDFS-15124.004.patch

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
> 

[jira] [Commented] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently

2020-01-30 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027147#comment-17027147
 ] 

Ahmed Hussein commented on HDFS-13179:
--

Yes please [~inigoiri]. I appreciate your time committing to branch-2.10.

> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails 
> intermittently
> --
>
> Key: HDFS-13179
> URL: https://issues.apache.org/jira/browse/HDFS-13179
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Ahmed Hussein
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, 
> HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip
>
>
> The error caused by TimeoutException because the test is waiting to ensure 
> that the file is replicated to DISK storage but the replication can't be 
> finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), 
> but the file is still on RAM_DISK - so there is no data loss.
> Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially 
> fixes the flakiness. 
> {code:java}
> try {
>   ensureFileReplicasOnStorageType(path1, DEFAULT);
> }catch (TimeoutException t){
>   LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on 
> RAM_DISK");
>   ensureFileReplicasOnStorageType(path1, RAM_DISK);
> }
>   }
> {code}
> Some thoughts:
> * Successful and failed tests run similar to the point when datanode 
> restarts. Restart line is the following in the log: LazyPersistTestCase - 
> Restarting the DataNode
> * There is a line which only occurs in the failed test: *addStoredBlock: 
> Redundant addStoredBlock request received for blk_1073741825_1001 on node 
> 127.0.0.1:49455 size 5242880*
> * This redundant request at BlockManager#addStoredBlock could be the main 
> reason for the test fail. Something wrong with the gen stamp? Corrupt 
> replicas? 
> =
> Current fail ratio based on my test of TestLazyPersistReplicaRecovery: 
> 1000 runs, 34 failures (3.4% fail)
> Failure rate analysis:
> TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4%
> 33 failures caused by: {noformat}
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 
> on 39589" 
> {noformat}
> 1 failure caused by: {noformat}
> java.net.BindException: Problem binding to [localhost:56729] 
> java.net.BindException: Address already in use; For more details see: 
> http://wiki.apache.org/hadoop/BindException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
>  Caused by: java.net.BindException: Address already in use at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
> {noformat}
> =
> Example stacktrace:
> {noformat}
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2017-11-01 10:36:49,499
> "Thread-1" prio=5 tid=13 runnable
> java.lang.Thread.State: RUNNABLE
> at java.lang.Thread.dumpThreads(Native Method)
> at java.lang.Thread.getAllStackTraces(Thread.java:1610)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027144#comment-17027144
 ] 

Íñigo Goiri commented on HDFS-15124:


Let's split the line to fix the checkstyle.

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch 

[jira] [Commented] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently

2020-01-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027142#comment-17027142
 ] 

Íñigo Goiri commented on HDFS-13179:


[~ahussein] you attached a branch-2.10 patch, do you want to commit there and 
some other branch?

> TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails 
> intermittently
> --
>
> Key: HDFS-13179
> URL: https://issues.apache.org/jira/browse/HDFS-13179
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Ahmed Hussein
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, 
> HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip
>
>
> The error caused by TimeoutException because the test is waiting to ensure 
> that the file is replicated to DISK storage but the replication can't be 
> finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), 
> but the file is still on RAM_DISK - so there is no data loss.
> Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially 
> fixes the flakiness. 
> {code:java}
> try {
>   ensureFileReplicasOnStorageType(path1, DEFAULT);
> }catch (TimeoutException t){
>   LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on 
> RAM_DISK");
>   ensureFileReplicasOnStorageType(path1, RAM_DISK);
> }
>   }
> {code}
> Some thoughts:
> * Successful and failed tests run similar to the point when datanode 
> restarts. Restart line is the following in the log: LazyPersistTestCase - 
> Restarting the DataNode
> * There is a line which only occurs in the failed test: *addStoredBlock: 
> Redundant addStoredBlock request received for blk_1073741825_1001 on node 
> 127.0.0.1:49455 size 5242880*
> * This redundant request at BlockManager#addStoredBlock could be the main 
> reason for the test fail. Something wrong with the gen stamp? Corrupt 
> replicas? 
> =
> Current fail ratio based on my test of TestLazyPersistReplicaRecovery: 
> 1000 runs, 34 failures (3.4% fail)
> Failure rate analysis:
> TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4%
> 33 failures caused by: {noformat}
> java.util.concurrent.TimeoutException: Timed out waiting for condition. 
> Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 
> on 39589" 
> {noformat}
> 1 failure caused by: {noformat}
> java.net.BindException: Problem binding to [localhost:56729] 
> java.net.BindException: Address already in use; For more details see: 
> http://wiki.apache.org/hadoop/BindException at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
>  Caused by: java.net.BindException: Address already in use at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49)
> {noformat}
> =
> Example stacktrace:
> {noformat}
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2017-11-01 10:36:49,499
> "Thread-1" prio=5 tid=13 runnable
> java.lang.Thread.State: RUNNABLE
> at java.lang.Thread.dumpThreads(Native Method)
> at java.lang.Thread.getAllStackTraces(Thread.java:1610)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027140#comment-17027140
 ] 

Hadoop QA commented on HDFS-15124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 40s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 167 unchanged - 0 fixed = 168 total (was 167) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 12s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}175m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992282/HDFS-15124.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 61bf8783bffa 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a7d72c5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28726/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28726/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Assigned] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.

2020-01-30 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li reassigned HDFS-15111:
-

Assignee: Xieming Li

> start / stopStandbyServices() should log which service it is transitioning 
> to/from.
> ---
>
> Key: HDFS-15111
> URL: https://issues.apache.org/jira/browse/HDFS-15111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, logging
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie++
>
> Trying to transition Observer to Standby state. Both 
> {{stopStandbyServices()}} and {{startStandbyServices()}} log that they are 
> stopping/starting Standby services.
> # {{startStandbyServices()}} should log which state it is transitioning TO.
> # {{stopStandbyServices()}} should log which state it is transitioning FROM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027070#comment-17027070
 ] 

Ctest commented on HDFS-15124:
--

[~elgoiri] Thank you for pointing this out. I have already uploaded a new patch 
for this.

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         

[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: HDFS-15124.003.patch

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch, HDFS-15124.003.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
>       } catch 

[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027057#comment-17027057
 ] 

Íñigo Goiri commented on HDFS-15147:


I think getting rid of guava dependencies is in general a good approach.
Let's wrap it into a function though.

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027055#comment-17027055
 ] 

Íñigo Goiri commented on HDFS-15124:


The tests look reasonable.
We may want to fix the checkstyle.
Actually, is there a chance we can do:
{{code}}
if (TopAuditLogger.class.getName().equals(logger.getClass().getName())) {
{{code}}

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) 

[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-30 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027051#comment-17027051
 ] 

Ahmed Hussein commented on HDFS-15147:
--

Thanks [~inigoiri] for the comments! I will work on the changes in a new patch.

{quote}Extract the key and value in LazyPersistTestCase#474. Given a name with 
before and after would help reading it. BTW, should we check for the value to 
be larger instead of equal?{quote}
Sure, naming will make it more readable.
I check for the value to be equal to return false. If one storageInfo does not 
update the block-count, then we know that the reports are not received yet and 
we return false.

{quote}Should we wrap the joinUninterruptibly() somewhat? What's the difference 
with the old Uninterruptibles#joinUninterruptibly() BTW?{quote}
I think I removed it because I thought there was no need to add a guava 
dependency for such straightforward logic. Perhaps it was not a good idea. I 
can put {{guava. joinUninterruptibly()}} back or wrap the new code block. 
Wrapping it makes sense if there is a tendency from the community to get rid of 
unnecessary dependency. WDYT?

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027046#comment-17027046
 ] 

Ctest edited comment on HDFS-15124 at 1/30/20 10:22 PM:


Hi, [~elgoiri]  I have already run the 4 failed test classes with my patch in 
the official docker image and all of them passed successfully.

I feel like the failures are not about the content in the patch. Actually the 
content in the patch won't be executed if not setting 
`dfs.namenode.audit.loggers` to 
`org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.

Could you please help to check whether these failures are due to some flakiness 
in tests?

Thank you a lot!


was (Author: ctest.team):
[~elgoiri] 

I have already run the 4 failed test classes with my patch in the official 
docker image and all of them passed successfully.

I feel like the failures are not about the content in the patch. Actually the 
content in the patch won't be executed if not setting 
`dfs.namenode.audit.loggers` to 
`org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.

Could you please help to check whether these failures are due to some flakiness 
in tests?

Thank you a lot!

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long 

[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027046#comment-17027046
 ] 

Ctest edited comment on HDFS-15124 at 1/30/20 10:22 PM:


Hi, [~elgoiri]  I have already run the 4 failed test classes with my patch in 
the official hadoop docker image and all of them passed successfully.

I feel like the failures are not about the content in the patch. Actually the 
content in the patch won't be executed if not setting 
`dfs.namenode.audit.loggers` to 
`org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.

Could you please help to check whether these failures are due to some flakiness 
in tests?

Thank you a lot!


was (Author: ctest.team):
Hi, [~elgoiri]  I have already run the 4 failed test classes with my patch in 
the official docker image and all of them passed successfully.

I feel like the failures are not about the content in the patch. Actually the 
content in the patch won't be executed if not setting 
`dfs.namenode.audit.loggers` to 
`org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.

Could you please help to check whether these failures are due to some flakiness 
in tests?

Thank you a lot!

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-30 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027046#comment-17027046
 ] 

Ctest commented on HDFS-15124:
--

[~elgoiri] 

I have already run the 4 failed test classes with my patch in the official 
docker image and all of them passed successfully.

I feel like the failures are not about the content in the patch. Actually the 
content in the patch won't be executed if not setting 
`dfs.namenode.audit.loggers` to 
`org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.

Could you please help to check whether these failures are due to some flakiness 
in tests?

Thank you a lot!

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     

[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Status: Patch Available  (was: In Progress)

> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: NIO, Windows, datanode
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 
> https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Component/s: datanode

> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: NIO, Windows, datanode
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 
> https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Labels: NIO Windows datanode  (was: )

> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>  Labels: NIO, Windows, datanode
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 
> https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15151 started by Lukas Majercak.
-
> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 
> https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Affects Version/s: 3.4.0
   3.3.1
   3.2.2
   3.1.4
   3.3.0

> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 
> https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Description: 
Proposing to give an option to use TransmitFile Windows function for file to 
socket data transfer. 
https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile

  was:
Proposing to give an option to use TransmitFile Windows function for file to 
socket data transfer. 



> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 
> https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Description: Proposing to give an option to use TransmitFile 

> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>
> Proposing to give an option to use TransmitFile 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Majercak updated HDFS-15151:
--
Description: 
Proposing to give an option to use TransmitFile Windows function for file to 
socket data transfer. 


  was:Proposing to give an option to use TransmitFile 


> Use TransmitFile for file to socket data transfer
> -
>
> Key: HDFS-15151
> URL: https://issues.apache.org/jira/browse/HDFS-15151
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Major
>
> Proposing to give an option to use TransmitFile Windows function for file to 
> socket data transfer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15151) Use TransmitFile for file to socket data transfer

2020-01-30 Thread Lukas Majercak (Jira)
Lukas Majercak created HDFS-15151:
-

 Summary: Use TransmitFile for file to socket data transfer
 Key: HDFS-15151
 URL: https://issues.apache.org/jira/browse/HDFS-15151
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Lukas Majercak
Assignee: Lukas Majercak






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026958#comment-17026958
 ] 

Íñigo Goiri commented on HDFS-15147:


This looks very good.
A few minor comments:
* Avoid the blank line removeal after BlockManager#4819 and FSNamesystem#4427.
* In addition to adding VisibleForTesting, it would be nice to have a comment 
on what getLastRedundancyMonitorTS() and getLazyPersistFileScrubberTS() 
represent and why we are exposing them.
* Make the constants in LazyPersistTestCase consistent with SEC, MS, MSEC or 
whichever we think is good.
* I would argue that we should use TimeUnits.SECONDS.toMillis(XXX) for the 
constants instead of multiplying. It's true that this then adds a cast from 
long to int but I would say that we should provide a waitFor that takes long as 
an input.
* Extract the key and value in LazyPersistTestCase#474. Given a name with 
before and after would help reading it. BTW, should we check for the value to 
be larger instead of equal?
* Complete the javadoc comment for shutdownDataNodes(). I don't think it 
currently parses.
* It might be good to give javadocs to all the wait methods (e.g., 
waitForScrubberCycle(), waitForRedundancyCount()).
* Let's use logger format {} in the new logging.
* Should we wrap the joinUninterruptibly() somewhat? What's the difference with 
the old Uninterruptibles#joinUninterruptibly()  BTW?

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode

2020-01-30 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15150:
-
Description: 
HDFS-9668 pointed out the issues around the DN lock being a point of contention 
some time ago, but that Jira went in a direction of creating a new FSDataset 
implementation which is very risky, and activity on the Jira has stalled for a 
few years now. Edit: Looks like HDFS-9668 eventually went in a similar 
direction to what I was thinking, so I will review that Jira in more detail to 
see if this one is necessary.

I feel there could be significant gains by moving to a ReentrantReadWrite lock 
within the DN. The current implementation is simply a ReentrantLock so any 
locker blocks all others.

Once place I think a read lock would benefit us significantly, is when the DN 
is serving a lot of small blocks and there are jobs which perform a lot of 
reads. The start of reading any blocks right now takes the lock, but if we 
moved this to a read lock, many reads could do this at the same time.

The first conservative step, would be to change the current lock and then make 
all accesses to it obtain the write lock. That way, we should keep the current 
behaviour and then we can selectively move some lock accesses to the readlock 
in separate Jiras.

I would appreciate any thoughts on this, and also if anyone has attempted it 
before and found any blockers.

  was:
HDFS-9668 pointed out the issues around the DN lock being a point of contention 
some time ago, but that Jira went in a direction of creating a new FSDataset 
implementation which is very risky, and activity on the Jira has stalled for a 
few years now.

I feel there could be significant gains by moving to a ReentrantReadWrite lock 
within the DN. The current implementation is simply a ReentrantLock so any 
locker blocks all others.

Once place I think a read lock would benefit us significantly, is when the DN 
is serving a lot of small blocks and there are jobs which perform a lot of 
reads. The start of reading any blocks right now takes the lock, but if we 
moved this to a read lock, many reads could do this at the same time.

The first conservative step, would be to change the current lock and then make 
all accesses to it obtain the write lock. That way, we should keep the current 
behaviour and then we can selectively move some lock accesses to the readlock 
in separate Jiras.

I would appreciate any thoughts on this, and also if anyone has attempted it 
before and found any blockers.


> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15140) Replace FoldedTreeSet in Datanode with SortedSet or TreeMap

2020-01-30 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026925#comment-17026925
 ] 

Stephen O'Donnell commented on HDFS-15140:
--

All failing tests pass locally so I think they are unrelated.

I am also open to simply switching FoldedTreeSet with TreeMap, and then we can 
avoid the need to sort the block reports.

The cost of TreeMap would be about 33MB of extra heap per 1M blocks, but the 
access time is similar to FoldedTreeSet, so we would not gain performance with 
this change. The hope is that we would avoid the degradation which occurs in 
foldedTreeSet for an unknown reason after some time.

Discussing this offline, someone asked me whether this would impact IBRs. I 
don't believe this will impact IBRs in any way. The reason, is that the IBR 
related blocks are added to the IncrementalBlockReportManager where they are 
stored in an unrelated structure and then transmitted to the namenode via the 
heartbeat thread (but not as part of the heartbeat).

> Replace FoldedTreeSet in Datanode with SortedSet or TreeMap
> ---
>
> Key: HDFS-15140
> URL: https://issues.apache.org/jira/browse/HDFS-15140
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15140.001.patch, HDFS-15140.002.patch
>
>
> Based on the problems discussed in HDFS-15131, I would like to explore 
> replacing the FoldedTreeSet structure in the datanode with a builtin Java 
> equivalent - either SortedSet or TreeMap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2020-01-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026854#comment-17026854
 ] 

Hadoop QA commented on HDFS-9668:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} HDFS-9668 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-9668 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12841233/HDFS-9668-26.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28725/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
>Priority: Major
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, 
> HDFS-9668-25.patch, HDFS-9668-26.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> 

[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026775#comment-17026775
 ] 

Hadoop QA commented on HDFS-15147:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 1s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_242 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_242 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK v1.8.0_242 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 491 unchanged - 5 fixed = 491 total (was 496) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed with JDK v1.8.0_242 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m  7s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 98m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:a969cad0a12 |
| JIRA Issue | HDFS-15147 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12992231/HDFS-15147-branch-2.10.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0f64b1090070 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 

[jira] [Commented] (HDFS-15119) Allow expiration of cached locations in DFSInputStream

2020-01-30 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026774#comment-17026774
 ] 

Ahmed Hussein commented on HDFS-15119:
--

Thanks [~ayushtkn]. I agree with you.
I will try to benchmark the tradeoff between the two options and follow up with 
another patch if I see significant performance improvement.

[~kihwal], Thanks for committing the patch to trunk. I have uploaded patch for 
branch-2.10.

> Allow expiration of cached locations in DFSInputStream
> --
>
> Key: HDFS-15119
> URL: https://issues.apache.org/jira/browse/HDFS-15119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-15119-branch-2.10.003.patch, HDFS-15119.001.patch, 
> HDFS-15119.002.patch, HDFS-15119.003.patch
>
>
> Staleness and other transient conditions can affect reads for a long time 
> since the block locations may not be re-fetched. It makes sense to make 
> cached locations to expire.
> For example, we may not take advantage of local-reads since the nodes are 
> blacklisted and have not been updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15150) Introduce read write lock to Datanode

2020-01-30 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDFS-15150:


 Summary: Introduce read write lock to Datanode
 Key: HDFS-15150
 URL: https://issues.apache.org/jira/browse/HDFS-15150
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.3.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


HDFS-9668 pointed out the issues around the DN lock being a point of contention 
some time ago, but that Jira went in a direction of creating a new FSDataset 
implementation which is very risky, and activity on the Jira has stalled for a 
few years now.

I feel there could be significant gains by moving to a ReentrantReadWrite lock 
within the DN. The current implementation is simply a ReentrantLock so any 
locker blocks all others.

Once place I think a read lock would benefit us significantly, is when the DN 
is serving a lot of small blocks and there are jobs which perform a lot of 
reads. The start of reading any blocks right now takes the lock, but if we 
moved this to a read lock, many reads could do this at the same time.

The first conservative step, would be to change the current lock and then make 
all accesses to it obtain the write lock. That way, we should keep the current 
behaviour and then we can selectively move some lock accesses to the readlock 
in separate Jiras.

I would appreciate any thoughts on this, and also if anyone has attempted it 
before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15149) TestDeadNodeDetection test cases time-out

2020-01-30 Thread Ahmed Hussein (Jira)
Ahmed Hussein created HDFS-15149:


 Summary: TestDeadNodeDetection test cases time-out
 Key: HDFS-15149
 URL: https://issues.apache.org/jira/browse/HDFS-15149
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


TestDeadNodeDetection JUnit time out times out with the following stack traces:

* 1- testDeadNodeDetectionInBackground*

{code:bash}
[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 264.757 
s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection
[ERROR] 
testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection) 
 Time elapsed: 125.806 s  <<< ERROR!
java.util.concurrent.TimeoutException: 
Timed out waiting for condition. Thread diagnostics:
Timestamp: 2020-01-24 08:31:07,023

"client DomainSocketWatcher" daemon prio=5 tid=117 runnable
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
at java.lang.Thread.run(Thread.java:748)
"Session-HouseKeeper-48c3205a"  prio=5 tid=350 timed_waiting
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty 
queue]" daemon prio=5 tid=752 in Object.wait()
java.lang.Thread.State: WAITING (on object monitor)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"CacheReplicationMonitor(1960356187)"  prio=5 tid=386 timed_waiting
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
at 
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181)
"Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting
java.lang.Thread.State: TIMED_WAITING
at java.lang.Object.wait(Native Method)
at java.util.TimerThread.mainLoop(Timer.java:552)
at java.util.TimerThread.run(Timer.java:505)
"org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460"
 daemon prio=5 tid=385 timed_waiting
java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420)
at java.lang.Thread.run(Thread.java:748)
"qtp164757726-349" daemon prio=5 tid=349 runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at 
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:466)
at 
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:403)
at 

[jira] [Updated] (HDFS-15147) LazyPersistTestCase wait logic is error pruned

2020-01-30 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15147:
-
Attachment: HDFS-15147-branch-2.10.001.patch

> LazyPersistTestCase wait logic is error pruned
> --
>
> Key: HDFS-15147
> URL: https://issues.apache.org/jira/browse/HDFS-15147
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch
>
>
> {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of 
> the test cases:
> * the wait periods to change of status is too long. It reaches 10 secs in 
> some cases.
> * triggerBlockReport() only triggers FBR of DN with index 0. This is counter 
> intuitive because the JUnit tests restart the DN assuming that the restarted 
> DN will send a FBR. However, this never happens because the DN will get a new 
> index post restart.
> {code:java}
>   protected final void triggerBlockReport()
>   throws IOException, InterruptedException {
> // Trigger block report to NN
> DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0));
> Thread.sleep(10 * 1000);
>   }
> {code}
> [~inigoiri] suggested that we propagate the findings and fixes from 
> HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will 
> eventually reduce the runtime and make the test cases more stable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org