[jira] [Commented] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.
[ https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027263#comment-17027263 ] Ayush Saxena commented on HDFS-15127: - [~elgoiri] plans updating? just removing the directory part of test, should be well enough > RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount > points. > > > Key: HDFS-15127 > URL: https://issues.apache.org/jira/browse/HDFS-15127 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-15127.000.patch, HDFS-15127.001.patch, > HDFS-15127.002.patch > > > A HASH_ALL mount point should not allow creating new files if one subcluster > is down. > If the file already existed in the past, this could lead to inconsistencies. > We should return an unavailable exception. > {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be > changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027262#comment-17027262 ] Ayush Saxena commented on HDFS-15115: - Thanx Everyone for the work here? Well there are two approaches possible here first is the old one, having null check and one in the v2 patch that is initializing builder irrespective of log level. Well personally I prefer having the previous approach of null check, but I am ok doing this way too, if everyone prefers this. [~weichiu] [~hexiaoqiao] any preferences??? Anyway [~belugabehr] [~wzx513] possible extending a UT? Not sure must be tricky to change the log level in middle, just give a check once. > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: David Mollitor >Priority: Major > Attachments: HDFS-15115.001.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15124: - Attachment: (was: HDFS-15124.004.patch) > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch, HDFS-15124.003.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); > } catch (RuntimeException re) { > throw re; > } cat
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027151#comment-17027151 ] Hadoop QA commented on HDFS-15124: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-15124 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15124 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12992292/HDFS-15124.004.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28727/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLog
[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15124: - Attachment: HDFS-15124.004.patch > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch, HDFS-15124.003.patch, HDFS-15124.004.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); > } catch (RuntimeException re) { > throw re; >
[jira] [Commented] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027147#comment-17027147 ] Ahmed Hussein commented on HDFS-13179: -- Yes please [~inigoiri]. I appreciate your time committing to branch-2.10. > TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails > intermittently > -- > > Key: HDFS-13179 > URL: https://issues.apache.org/jira/browse/HDFS-13179 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 3.0.0 >Reporter: Gabor Bota >Assignee: Ahmed Hussein >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, > HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip > > > The error caused by TimeoutException because the test is waiting to ensure > that the file is replicated to DISK storage but the replication can't be > finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), > but the file is still on RAM_DISK - so there is no data loss. > Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially > fixes the flakiness. > {code:java} > try { > ensureFileReplicasOnStorageType(path1, DEFAULT); > }catch (TimeoutException t){ > LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on > RAM_DISK"); > ensureFileReplicasOnStorageType(path1, RAM_DISK); > } > } > {code} > Some thoughts: > * Successful and failed tests run similar to the point when datanode > restarts. Restart line is the following in the log: LazyPersistTestCase - > Restarting the DataNode > * There is a line which only occurs in the failed test: *addStoredBlock: > Redundant addStoredBlock request received for blk_1073741825_1001 on node > 127.0.0.1:49455 size 5242880* > * This redundant request at BlockManager#addStoredBlock could be the main > reason for the test fail. Something wrong with the gen stamp? Corrupt > replicas? > = > Current fail ratio based on my test of TestLazyPersistReplicaRecovery: > 1000 runs, 34 failures (3.4% fail) > Failure rate analysis: > TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4% > 33 failures caused by: {noformat} > java.util.concurrent.TimeoutException: Timed out waiting for condition. > Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 > on 39589" > {noformat} > 1 failure caused by: {noformat} > java.net.BindException: Problem binding to [localhost:56729] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49) > Caused by: java.net.BindException: Address already in use at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49) > {noformat} > = > Example stacktrace: > {noformat} > Timed out waiting for condition. Thread diagnostics: > Timestamp: 2017-11-01 10:36:49,499 > "Thread-1" prio=5 tid=13 runnable > java.lang.Thread.State: RUNNABLE > at java.lang.Thread.dumpThreads(Native Method) > at java.lang.Thread.getAllStackTraces(Thread.java:1610) > at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) > at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) > at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027144#comment-17027144 ] Íñigo Goiri commented on HDFS-15124: Let's split the line to fix the checkstyle. > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch, HDFS-15124.003.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); >
[jira] [Commented] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027142#comment-17027142 ] Íñigo Goiri commented on HDFS-13179: [~ahussein] you attached a branch-2.10 patch, do you want to commit there and some other branch? > TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails > intermittently > -- > > Key: HDFS-13179 > URL: https://issues.apache.org/jira/browse/HDFS-13179 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 3.0.0 >Reporter: Gabor Bota >Assignee: Ahmed Hussein >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-13179-branch-2.10.003.patch, HDFS-13179.001.patch, > HDFS-13179.002.patch, HDFS-13179.003.patch, test runs.zip > > > The error caused by TimeoutException because the test is waiting to ensure > that the file is replicated to DISK storage but the replication can't be > finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), > but the file is still on RAM_DISK - so there is no data loss. > Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially > fixes the flakiness. > {code:java} > try { > ensureFileReplicasOnStorageType(path1, DEFAULT); > }catch (TimeoutException t){ > LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on > RAM_DISK"); > ensureFileReplicasOnStorageType(path1, RAM_DISK); > } > } > {code} > Some thoughts: > * Successful and failed tests run similar to the point when datanode > restarts. Restart line is the following in the log: LazyPersistTestCase - > Restarting the DataNode > * There is a line which only occurs in the failed test: *addStoredBlock: > Redundant addStoredBlock request received for blk_1073741825_1001 on node > 127.0.0.1:49455 size 5242880* > * This redundant request at BlockManager#addStoredBlock could be the main > reason for the test fail. Something wrong with the gen stamp? Corrupt > replicas? > = > Current fail ratio based on my test of TestLazyPersistReplicaRecovery: > 1000 runs, 34 failures (3.4% fail) > Failure rate analysis: > TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4% > 33 failures caused by: {noformat} > java.util.concurrent.TimeoutException: Timed out waiting for condition. > Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 > on 39589" > {noformat} > 1 failure caused by: {noformat} > java.net.BindException: Problem binding to [localhost:56729] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49) > Caused by: java.net.BindException: Address already in use at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49) > {noformat} > = > Example stacktrace: > {noformat} > Timed out waiting for condition. Thread diagnostics: > Timestamp: 2017-11-01 10:36:49,499 > "Thread-1" prio=5 tid=13 runnable > java.lang.Thread.State: RUNNABLE > at java.lang.Thread.dumpThreads(Native Method) > at java.lang.Thread.getAllStackTraces(Thread.java:1610) > at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) > at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) > at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027140#comment-17027140 ] Hadoop QA commented on HDFS-15124: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 167 unchanged - 0 fixed = 168 total (was 167) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}111m 12s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}175m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15124 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12992282/HDFS-15124.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 61bf8783bffa 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a7d72c5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28726/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28726/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | |
[jira] [Assigned] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.
[ https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li reassigned HDFS-15111: - Assignee: Xieming Li > start / stopStandbyServices() should log which service it is transitioning > to/from. > --- > > Key: HDFS-15111 > URL: https://issues.apache.org/jira/browse/HDFS-15111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, logging >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie++ > > Trying to transition Observer to Standby state. Both > {{stopStandbyServices()}} and {{startStandbyServices()}} log that they are > stopping/starting Standby services. > # {{startStandbyServices()}} should log which state it is transitioning TO. > # {{stopStandbyServices()}} should log which state it is transitioning FROM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027070#comment-17027070 ] Ctest commented on HDFS-15124: -- [~elgoiri] Thank you for pointing this out. I have already uploaded a new patch for this. > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch, HDFS-15124.003.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); >
[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15124: - Attachment: HDFS-15124.003.patch > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch, HDFS-15124.003.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); > } catch (RuntimeException re) { > throw re; > } catch (Excepti
[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned
[ https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027057#comment-17027057 ] Íñigo Goiri commented on HDFS-15147: I think getting rid of guava dependencies is in general a good approach. Let's wrap it into a function though. > LazyPersistTestCase wait logic is error pruned > -- > > Key: HDFS-15147 > URL: https://issues.apache.org/jira/browse/HDFS-15147 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch > > > {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of > the test cases: > * the wait periods to change of status is too long. It reaches 10 secs in > some cases. > * triggerBlockReport() only triggers FBR of DN with index 0. This is counter > intuitive because the JUnit tests restart the DN assuming that the restarted > DN will send a FBR. However, this never happens because the DN will get a new > index post restart. > {code:java} > protected final void triggerBlockReport() > throws IOException, InterruptedException { > // Trigger block report to NN > DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0)); > Thread.sleep(10 * 1000); > } > {code} > [~inigoiri] suggested that we propagate the findings and fixes from > HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will > eventually reduce the runtime and make the test cases more stable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027055#comment-17027055 ] Íñigo Goiri commented on HDFS-15124: The tests look reasonable. We may want to fix the checkstyle. Actually, is there a chance we can do: {{code}} if (TopAuditLogger.class.getName().equals(logger.getClass().getName())) { {{code}} > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Cla
[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned
[ https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027051#comment-17027051 ] Ahmed Hussein commented on HDFS-15147: -- Thanks [~inigoiri] for the comments! I will work on the changes in a new patch. {quote}Extract the key and value in LazyPersistTestCase#474. Given a name with before and after would help reading it. BTW, should we check for the value to be larger instead of equal?{quote} Sure, naming will make it more readable. I check for the value to be equal to return false. If one storageInfo does not update the block-count, then we know that the reports are not received yet and we return false. {quote}Should we wrap the joinUninterruptibly() somewhat? What's the difference with the old Uninterruptibles#joinUninterruptibly() BTW?{quote} I think I removed it because I thought there was no need to add a guava dependency for such straightforward logic. Perhaps it was not a good idea. I can put {{guava. joinUninterruptibly()}} back or wrap the new code block. Wrapping it makes sense if there is a tendency from the community to get rid of unnecessary dependency. WDYT? > LazyPersistTestCase wait logic is error pruned > -- > > Key: HDFS-15147 > URL: https://issues.apache.org/jira/browse/HDFS-15147 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch > > > {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of > the test cases: > * the wait periods to change of status is too long. It reaches 10 secs in > some cases. > * triggerBlockReport() only triggers FBR of DN with index 0. This is counter > intuitive because the JUnit tests restart the DN assuming that the restarted > DN will send a FBR. However, this never happens because the DN will get a new > index post restart. > {code:java} > protected final void triggerBlockReport() > throws IOException, InterruptedException { > // Trigger block report to NN > DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0)); > Thread.sleep(10 * 1000); > } > {code} > [~inigoiri] suggested that we propagate the findings and fixes from > HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will > eventually reduce the runtime and make the test cases more stable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027046#comment-17027046 ] Ctest edited comment on HDFS-15124 at 1/30/20 10:22 PM: Hi, [~elgoiri] I have already run the 4 failed test classes with my patch in the official docker image and all of them passed successfully. I feel like the failures are not about the content in the patch. Actually the content in the patch won't be executed if not setting `dfs.namenode.audit.loggers` to `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. Could you please help to check whether these failures are due to some flakiness in tests? Thank you a lot! was (Author: ctest.team): [~elgoiri] I have already run the 4 failed test classes with my patch in the official docker image and all of them passed successfully. I feel like the failures are not about the content in the patch. Actually the content in the patch won't be executed if not setting `dfs.namenode.audit.loggers` to `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. Could you please help to check whether these failures are due to some flakiness in tests? Thank you a lot! > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } >
[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027046#comment-17027046 ] Ctest edited comment on HDFS-15124 at 1/30/20 10:22 PM: Hi, [~elgoiri] I have already run the 4 failed test classes with my patch in the official hadoop docker image and all of them passed successfully. I feel like the failures are not about the content in the patch. Actually the content in the patch won't be executed if not setting `dfs.namenode.audit.loggers` to `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. Could you please help to check whether these failures are due to some flakiness in tests? Thank you a lot! was (Author: ctest.team): Hi, [~elgoiri] I have already run the 4 failed test classes with my patch in the official docker image and all of them passed successfully. I feel like the failures are not about the content in the patch. Actually the content in the patch won't be executed if not setting `dfs.namenode.audit.loggers` to `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. Could you please help to check whether these failures are due to some flakiness in tests? Thank you a lot! > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) {
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027046#comment-17027046 ] Ctest commented on HDFS-15124: -- [~elgoiri] I have already run the 4 failed test classes with my patch in the official docker image and all of them passed successfully. I feel like the failures are not about the content in the patch. Actually the content in the patch won't be executed if not setting `dfs.namenode.audit.loggers` to `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. Could you please help to check whether these failures are due to some flakiness in tests? Thank you a lot! > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmp
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Status: Patch Available (was: In Progress) > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: NIO, Windows, datanode > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Component/s: datanode > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: NIO, Windows, datanode > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Labels: NIO Windows datanode (was: ) > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > Labels: NIO, Windows, datanode > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15151 started by Lukas Majercak. - > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Affects Version/s: 3.4.0 3.3.1 3.2.2 3.1.4 3.3.0 > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.3.1, 3.4.0 >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Description: Proposing to give an option to use TransmitFile Windows function for file to socket data transfer. https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile was: Proposing to give an option to use TransmitFile Windows function for file to socket data transfer. > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. > https://docs.microsoft.com/en-us/windows/win32/api/mswsock/nf-mswsock-transmitfile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Description: Proposing to give an option to use TransmitFile > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15151) Use TransmitFile for file to socket data transfer
[ https://issues.apache.org/jira/browse/HDFS-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Majercak updated HDFS-15151: -- Description: Proposing to give an option to use TransmitFile Windows function for file to socket data transfer. was:Proposing to give an option to use TransmitFile > Use TransmitFile for file to socket data transfer > - > > Key: HDFS-15151 > URL: https://issues.apache.org/jira/browse/HDFS-15151 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Major > > Proposing to give an option to use TransmitFile Windows function for file to > socket data transfer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15151) Use TransmitFile for file to socket data transfer
Lukas Majercak created HDFS-15151: - Summary: Use TransmitFile for file to socket data transfer Key: HDFS-15151 URL: https://issues.apache.org/jira/browse/HDFS-15151 Project: Hadoop HDFS Issue Type: New Feature Reporter: Lukas Majercak Assignee: Lukas Majercak -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned
[ https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026958#comment-17026958 ] Íñigo Goiri commented on HDFS-15147: This looks very good. A few minor comments: * Avoid the blank line removeal after BlockManager#4819 and FSNamesystem#4427. * In addition to adding VisibleForTesting, it would be nice to have a comment on what getLastRedundancyMonitorTS() and getLazyPersistFileScrubberTS() represent and why we are exposing them. * Make the constants in LazyPersistTestCase consistent with SEC, MS, MSEC or whichever we think is good. * I would argue that we should use TimeUnits.SECONDS.toMillis(XXX) for the constants instead of multiplying. It's true that this then adds a cast from long to int but I would say that we should provide a waitFor that takes long as an input. * Extract the key and value in LazyPersistTestCase#474. Given a name with before and after would help reading it. BTW, should we check for the value to be larger instead of equal? * Complete the javadoc comment for shutdownDataNodes(). I don't think it currently parses. * It might be good to give javadocs to all the wait methods (e.g., waitForScrubberCycle(), waitForRedundancyCount()). * Let's use logger format {} in the new logging. * Should we wrap the joinUninterruptibly() somewhat? What's the difference with the old Uninterruptibles#joinUninterruptibly() BTW? > LazyPersistTestCase wait logic is error pruned > -- > > Key: HDFS-15147 > URL: https://issues.apache.org/jira/browse/HDFS-15147 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch > > > {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of > the test cases: > * the wait periods to change of status is too long. It reaches 10 secs in > some cases. > * triggerBlockReport() only triggers FBR of DN with index 0. This is counter > intuitive because the JUnit tests restart the DN assuming that the restarted > DN will send a FBR. However, this never happens because the DN will get a new > index post restart. > {code:java} > protected final void triggerBlockReport() > throws IOException, InterruptedException { > // Trigger block report to NN > DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0)); > Thread.sleep(10 * 1000); > } > {code} > [~inigoiri] suggested that we propagate the findings and fixes from > HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will > eventually reduce the runtime and make the test cases more stable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-15150: - Description: HDFS-9668 pointed out the issues around the DN lock being a point of contention some time ago, but that Jira went in a direction of creating a new FSDataset implementation which is very risky, and activity on the Jira has stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a similar direction to what I was thinking, so I will review that Jira in more detail to see if this one is necessary. I feel there could be significant gains by moving to a ReentrantReadWrite lock within the DN. The current implementation is simply a ReentrantLock so any locker blocks all others. Once place I think a read lock would benefit us significantly, is when the DN is serving a lot of small blocks and there are jobs which perform a lot of reads. The start of reading any blocks right now takes the lock, but if we moved this to a read lock, many reads could do this at the same time. The first conservative step, would be to change the current lock and then make all accesses to it obtain the write lock. That way, we should keep the current behaviour and then we can selectively move some lock accesses to the readlock in separate Jiras. I would appreciate any thoughts on this, and also if anyone has attempted it before and found any blockers. was: HDFS-9668 pointed out the issues around the DN lock being a point of contention some time ago, but that Jira went in a direction of creating a new FSDataset implementation which is very risky, and activity on the Jira has stalled for a few years now. I feel there could be significant gains by moving to a ReentrantReadWrite lock within the DN. The current implementation is simply a ReentrantLock so any locker blocks all others. Once place I think a read lock would benefit us significantly, is when the DN is serving a lot of small blocks and there are jobs which perform a lot of reads. The start of reading any blocks right now takes the lock, but if we moved this to a read lock, many reads could do this at the same time. The first conservative step, would be to change the current lock and then make all accesses to it obtain the write lock. That way, we should keep the current behaviour and then we can selectively move some lock accesses to the readlock in separate Jiras. I would appreciate any thoughts on this, and also if anyone has attempted it before and found any blockers. > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15140) Replace FoldedTreeSet in Datanode with SortedSet or TreeMap
[ https://issues.apache.org/jira/browse/HDFS-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026925#comment-17026925 ] Stephen O'Donnell commented on HDFS-15140: -- All failing tests pass locally so I think they are unrelated. I am also open to simply switching FoldedTreeSet with TreeMap, and then we can avoid the need to sort the block reports. The cost of TreeMap would be about 33MB of extra heap per 1M blocks, but the access time is similar to FoldedTreeSet, so we would not gain performance with this change. The hope is that we would avoid the degradation which occurs in foldedTreeSet for an unknown reason after some time. Discussing this offline, someone asked me whether this would impact IBRs. I don't believe this will impact IBRs in any way. The reason, is that the IBR related blocks are added to the IncrementalBlockReportManager where they are stored in an unrelated structure and then transmitted to the namenode via the heartbeat thread (but not as part of the heartbeat). > Replace FoldedTreeSet in Datanode with SortedSet or TreeMap > --- > > Key: HDFS-15140 > URL: https://issues.apache.org/jira/browse/HDFS-15140 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15140.001.patch, HDFS-15140.002.patch > > > Based on the problems discussed in HDFS-15131, I would like to explore > replacing the FoldedTreeSet structure in the datanode with a builtin Java > equivalent - either SortedSet or TreeMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026854#comment-17026854 ] Hadoop QA commented on HDFS-9668: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s{color} | {color:red} HDFS-9668 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-9668 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12841233/HDFS-9668-26.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28725/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Optimize the locking in FsDatasetImpl > - > > Key: HDFS-9668 > URL: https://issues.apache.org/jira/browse/HDFS-9668 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Jingcheng Du >Assignee: Jingcheng Du >Priority: Major > Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, > HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, > HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, > HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, > HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, > HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, > HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, > HDFS-9668-25.patch, HDFS-9668-26.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, > HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, > HDFS-9668-9.patch, execution_time.png > > > During the HBase test on a tiered storage of HDFS (WAL is stored in > SSD/RAMDISK, and all other files are stored in HDD), we observe many > long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part > of the jstack result: > {noformat} > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48521 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread > t@93336 >java.lang.Thread.State: BLOCKED > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:) > - waiting to lock <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - None > > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread > t@93335 >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:1012) > at > org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140) > - locked <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receive
[jira] [Commented] (HDFS-15147) LazyPersistTestCase wait logic is error pruned
[ https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026775#comment-17026775 ] Hadoop QA commented on HDFS-15147: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 1s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_242 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_242 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.8.0_242 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 491 unchanged - 5 fixed = 491 total (was 496) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed with JDK v1.8.0_242 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication | | | hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:a969cad0a12 | | JIRA Issue | HDFS-15147 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12992231/HDFS-15147-branch-2.10.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0f64b1090070 4.15.0-74-generic #84-Ubuntu SMP Thu D
[jira] [Commented] (HDFS-15119) Allow expiration of cached locations in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026774#comment-17026774 ] Ahmed Hussein commented on HDFS-15119: -- Thanks [~ayushtkn]. I agree with you. I will try to benchmark the tradeoff between the two options and follow up with another patch if I see significant performance improvement. [~kihwal], Thanks for committing the patch to trunk. I have uploaded patch for branch-2.10. > Allow expiration of cached locations in DFSInputStream > -- > > Key: HDFS-15119 > URL: https://issues.apache.org/jira/browse/HDFS-15119 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15119-branch-2.10.003.patch, HDFS-15119.001.patch, > HDFS-15119.002.patch, HDFS-15119.003.patch > > > Staleness and other transient conditions can affect reads for a long time > since the block locations may not be re-fetched. It makes sense to make > cached locations to expire. > For example, we may not take advantage of local-reads since the nodes are > blacklisted and have not been updated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15150) Introduce read write lock to Datanode
Stephen O'Donnell created HDFS-15150: Summary: Introduce read write lock to Datanode Key: HDFS-15150 URL: https://issues.apache.org/jira/browse/HDFS-15150 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.3.0 Reporter: Stephen O'Donnell Assignee: Stephen O'Donnell HDFS-9668 pointed out the issues around the DN lock being a point of contention some time ago, but that Jira went in a direction of creating a new FSDataset implementation which is very risky, and activity on the Jira has stalled for a few years now. I feel there could be significant gains by moving to a ReentrantReadWrite lock within the DN. The current implementation is simply a ReentrantLock so any locker blocks all others. Once place I think a read lock would benefit us significantly, is when the DN is serving a lot of small blocks and there are jobs which perform a lot of reads. The start of reading any blocks right now takes the lock, but if we moved this to a read lock, many reads could do this at the same time. The first conservative step, would be to change the current lock and then make all accesses to it obtain the write lock. That way, we should keep the current behaviour and then we can selectively move some lock accesses to the readlock in separate Jiras. I would appreciate any thoughts on this, and also if anyone has attempted it before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15149) TestDeadNodeDetection test cases time-out
Ahmed Hussein created HDFS-15149: Summary: TestDeadNodeDetection test cases time-out Key: HDFS-15149 URL: https://issues.apache.org/jira/browse/HDFS-15149 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Ahmed Hussein Assignee: Ahmed Hussein TestDeadNodeDetection JUnit time out times out with the following stack traces: * 1- testDeadNodeDetectionInBackground* {code:bash} [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 264.757 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDeadNodeDetection [ERROR] testDeadNodeDetectionInBackground(org.apache.hadoop.hdfs.TestDeadNodeDetection) Time elapsed: 125.806 s <<< ERROR! java.util.concurrent.TimeoutException: Timed out waiting for condition. Thread diagnostics: Timestamp: 2020-01-24 08:31:07,023 "client DomainSocketWatcher" daemon prio=5 tid=117 runnable java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503) at java.lang.Thread.run(Thread.java:748) "Session-HouseKeeper-48c3205a" prio=5 tid=350 timed_waiting java.lang.Thread.State: TIMED_WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "java.util.concurrent.ThreadPoolExecutor$Worker@3ae54156[State = -1, empty queue]" daemon prio=5 tid=752 in Object.wait() java.lang.Thread.State: WAITING (on object monitor) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "CacheReplicationMonitor(1960356187)" prio=5 tid=386 timed_waiting java.lang.Thread.State: TIMED_WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163) at org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor.run(CacheReplicationMonitor.java:181) "Timer for 'NameNode' metrics system" daemon prio=5 tid=339 timed_waiting java.lang.Thread.State: TIMED_WAITING at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:552) at java.util.TimerThread.run(Timer.java:505) "org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber@6b760460" daemon prio=5 tid=385 timed_waiting java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:4420) at java.lang.Thread.run(Thread.java:748) "qtp164757726-349" daemon prio=5 tid=349 runnable java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101) at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:466) at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:403) at org.eclipse.jetty.util.thread.strategy.E
[jira] [Updated] (HDFS-15147) LazyPersistTestCase wait logic is error pruned
[ https://issues.apache.org/jira/browse/HDFS-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15147: - Attachment: HDFS-15147-branch-2.10.001.patch > LazyPersistTestCase wait logic is error pruned > -- > > Key: HDFS-15147 > URL: https://issues.apache.org/jira/browse/HDFS-15147 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: HDFS-15147-branch-2.10.001.patch, HDFS-15147.001.patch > > > {{LazyPersistTestCase}} has some issues hat lead to inconsistent result of > the test cases: > * the wait periods to change of status is too long. It reaches 10 secs in > some cases. > * triggerBlockReport() only triggers FBR of DN with index 0. This is counter > intuitive because the JUnit tests restart the DN assuming that the restarted > DN will send a FBR. However, this never happens because the DN will get a new > index post restart. > {code:java} > protected final void triggerBlockReport() > throws IOException, InterruptedException { > // Trigger block report to NN > DataNodeTestUtils.triggerBlockReport(cluster.getDataNodes().get(0)); > Thread.sleep(10 * 1000); > } > {code} > [~inigoiri] suggested that we propagate the findings and fixes from > HDFS-13179 and HDFS-15144 into {{LazyPersistTestCase.java}}. This will > eventually reduce the runtime and make the test cases more stable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org