[jira] [Resolved] (HDFS-16686) GetJournalEditServlet fails to authorize valid Kerberos request
[ https://issues.apache.org/jira/browse/HDFS-16686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-16686. - Fix Version/s: 3.3.9 Hadoop Flags: Reviewed Resolution: Fixed > GetJournalEditServlet fails to authorize valid Kerberos request > --- > > Key: HDFS-16686 > URL: https://issues.apache.org/jira/browse/HDFS-16686 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0, 3.3.9 > Environment: Running in Kubernetes using Java 11 in an HA > configuration. JournalNodes run on separate pods and have their own Kerberos > principal "jn/@". >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > GetJournalEditServlet uses request.getRemoteuser() to determine the > remoteShortName for Kerberos authorization, which fails to match when the > JournalNode uses its own Kerberos principal (e.g. jn/@). > This can be fixed by using the UserGroupInformation provided by the base > DfsServlet class using the getUGI(request, conf) call. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-4043) Namenode Kerberos Login does not use proper hostname for host qualified hdfs principal name.
[ https://issues.apache.org/jira/browse/HDFS-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-4043. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Namenode Kerberos Login does not use proper hostname for host qualified hdfs > principal name. > > > Key: HDFS-4043 > URL: https://issues.apache.org/jira/browse/HDFS-4043 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.0.3-alpha, > 3.4.0, 3.3.9 > Environment: CDH4U1 on Ubuntu 12.04 >Reporter: Ahad Rana >Assignee: Steve Vaughan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Original Estimate: 24h > Time Spent: 50m > Remaining Estimate: 23h 10m > > The Namenode uses the loginAsNameNodeUser method in NameNode.java to login > using the hdfs principal. This method in turn invokes SecurityUtil.login with > a hostname (last parameter) obtained via a call to InetAddress.getHostName. > This call does not always return the fully qualified host name, and thus > causes the namenode to login to fail due to kerberos's inability to find a > matching hdfs principal in the hdfs.keytab file. Instead it should use > InetAddress.getCanonicalHostName. This is consistent with what is used > internally by SecurityUtil.java to login in other services, such as the > DataNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error
[ https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-16702. - Hadoop Flags: Reviewed Resolution: Fixed > MiniDFSCluster should report cause of exception in assertion error > -- > > Key: HDFS-16702 > URL: https://issues.apache.org/jira/browse/HDFS-16702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Environment: Tests running in the Hadoop dev environment image. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > When the MiniDFSClsuter detects that an exception caused an exit, it should > include that exception as the cause for the AssertionError that it throws. > The current AssertError simply reports the message "Test resulted in an > unexpected exit" and provides a stack trace to the location of the check for > an exit exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16507: Fix Version/s: 3.2.4 > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: tomscut >Assignee: tomscut >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > org.eclipse.jetty.server.Server.handle(Server.java:539) > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > >
[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16507: Fix Version/s: 3.3.3 > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: tomscut >Assignee: tomscut >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > org.eclipse.jetty.server.Server.handle(Server.java:539) > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > >
[jira] [Resolved] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-16507. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: tomscut >Assignee: tomscut >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > org.eclipse.jetty.server.Server.handle(Server.java:539) > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > >
[jira] [Updated] (HDFS-16271) RBF: NullPointerException when setQuota through routers with quota disabled
[ https://issues.apache.org/jira/browse/HDFS-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16271: Fix Version/s: 3.3.2 (was: 3.3.1) > RBF: NullPointerException when setQuota through routers with quota disabled > --- > > Key: HDFS-16271 > URL: https://issues.apache.org/jira/browse/HDFS-16271 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16271.001.patch, HDFS-16271.002.patch > > > When we started routers with *dfs.federation.router.quota.enable=false*, and > try to setQuota through them, NullPointerException caught. > The cuase of NPE is that the Router#quotaManager not initialized when > dfs.federation.router.quota.enable=false, > but when executing setQuota rpc request inside router, we wolud use it in > method Quota#isMountEntry without null check . > I think it's better to check whether Router#isQuotaEnabled is true before use > Router#quotaManager, and throw an IOException with readable message if need. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16344) Improve DirectoryScanner.Stats#toString
[ https://issues.apache.org/jira/browse/HDFS-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16344: Fix Version/s: 3.3.2 (was: 3.3.3) > Improve DirectoryScanner.Stats#toString > --- > > Key: HDFS-16344 > URL: https://issues.apache.org/jira/browse/HDFS-16344 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Attachments: image-2021-11-21-19-35-16-838.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Improve DirectoryScanner.Stats#toString. > !image-2021-11-21-19-35-16-838.png|width=1019,height=71! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16332: Fix Version/s: 3.3.2 (was: 3.3.3) > Expired block token causes slow read due to missing handling in sasl handshake > -- > > Key: HDFS-16332 > URL: https://issues.apache.org/jira/browse/HDFS-16332 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, dfs, dfsclient >Affects Versions: 2.8.5, 3.3.1 >Reporter: Shinya Yoshida >Assignee: Shinya Yoshida >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from > 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png > > Time Spent: 5h 40m > Remaining Estimate: 0h > > We're operating the HBase 1.4.x cluster on Hadoop 2.8.5. > We're recently evaluating Kerberos secured HBase and Hadoop cluster with > production load and we observed HBase's response slows >= several seconds, > and about several minutes for worst-case (about once~three times a month). > The following image is a scatter plot of HBase's response slow, each circle > is each base's slow response log. > The X-axis is the date time of the log occurred, the Y-axis is the response > slow time. > !Screenshot from 2021-11-18 12-14-29.png! > We could reproduce this issue by reducing "dfs.block.access.token.lifetime" > and we could figure out the cause. > (We used dfs.block.access.token.lifetime=60, i.e. 1 hour) > When hedged read enabled: > !Screenshot from 2021-11-18 12-11-34.png! > When hedged read disabled: > !Screenshot from 2021-11-18 13-31-35.png! > As you can see, it's worst if the hedged read is enabled. However, it happens > whether the hedged read is enabled or not. > This impacts our 99%tile response time. > This happens when the block token is expired and the root cause is the wrong > handling of the InvalidToken exception in sasl handshake in > SaslDataTransferServer. > I propose to add a new response code for DataTransferEncryptorStatus to > request the client to update the block token like DataTransferProtos does. > The test code and patch is available in > https://github.com/apache/hadoop/pull/3677 > We could reproduce this issue by the following test code in 2.8.5 branch and > trunk as I tested > {code:java} > // HDFS is configured as secure cluster > try (FileSystem fs = newFileSystem(); > FSDataInputStream in = fs.open(PATH)) { > waitBlockTokenExpired(in); > in.read(0, bytes, 0, bytes.length) > } > private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception { > DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream(); > for (LocatedBlock block : innerStream.getAllBlocks()) { > while (!SecurityTestUtil.isBlockTokenExpired(block.getBlockToken())) { > Thread.sleep(100); > } > } > } > {code} > Here is the log we got, we added a custom log before and after the block > token refresh: > https://github.com/bitterfox/hadoop/commit/173a9f876f2264b76af01d658f624197936fd79c > {code} > 2021-11-16 09:40:20,330 WARN [hedgedRead-247] impl.BlockReaderFactory: I/O > error constructing remote block reader. > java.io.IOException: DIGEST-MD5: IO error acquiring password > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:420) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:475) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:389) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:568) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2880) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:815) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:740) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385) > at >
[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully
[ https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16350: Fix Version/s: 3.3.2 (was: 3.3.3) > Datanode start time should be set after RPC server starts successfully > -- > > Key: HDFS-16350 > URL: https://issues.apache.org/jira/browse/HDFS-16350 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > We set start time of Datanode when the class is instantiated but it should be > ideally set only after RPC server starts and RPC handlers are initialized to > serve client requests. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16336) De-flake TestRollingUpgrade#testRollback
[ https://issues.apache.org/jira/browse/HDFS-16336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16336: Fix Version/s: 3.3.2 (was: 3.3.3) > De-flake TestRollingUpgrade#testRollback > > > Key: HDFS-16336 > URL: https://issues.apache.org/jira/browse/HDFS-16336 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, test >Affects Versions: 3.4.0 >Reporter: Kevin Wikant >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This pull request: [https://github.com/apache/hadoop/pull/3675] > Failed Jenkins pre-commit job due to an unrelated unit test failure: > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3675/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > {code:java} > [ERROR] Failures: > [ERROR] > org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade) > [ERROR] Run 1: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 > expected null, but > was: createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> > [ERROR] Run 2: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 > expected null, but > was: createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> > [ERROR] Run 3: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 > expected null, but > was: createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> {code} > Seems that perhaps "TestRollingUpgrade.testRollback" is a flaky unit test -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16171) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16171: Fix Version/s: 3.3.2 (was: 3.3.3) > De-flake testDecommissionStatus > --- > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16339) Show the threshold when mover threads quota is exceeded
[ https://issues.apache.org/jira/browse/HDFS-16339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16339: Fix Version/s: 3.3.2 (was: 3.3.3) > Show the threshold when mover threads quota is exceeded > --- > > Key: HDFS-16339 > URL: https://issues.apache.org/jira/browse/HDFS-16339 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: image-2021-11-20-17-23-04-924.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > Show the threshold when mover threads quota is exceeded in > DataXceiver#replaceBlock and DataXceiver#copyBlock. > !image-2021-11-20-17-23-04-924.png|width=1233,height=124! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks
[ https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476281#comment-17476281 ] Chao Sun commented on HDFS-16420: - [~tasanuma] done > Avoid deleting unique data blocks when deleting redundancy striped blocks > - > > Key: HDFS-16420 > URL: https://issues.apache.org/jira/browse/HDFS-16420 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liubingxing >Assignee: Jackson Wang >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: image-2022-01-10-17-31-35-910.png, > image-2022-01-10-17-32-56-981.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We have a similar problem as HDFS-16297 described. > In our cluster, we used {color:#de350b}ec(6+3) + balancer with version > 3.1.0{color}, and the {color:#de350b}missing block{color} happened. > We got the block(blk_-9223372036824119008) info from fsck, only 5 live > replications and multiple redundant replications. > {code:java} > blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5 > blk_-9223372036824119007:DatanodeInfoWithStorage, > blk_-9223372036824119002:DatanodeInfoWithStorage, > blk_-9223372036824119001:DatanodeInfoWithStorage, > blk_-9223372036824119000:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage {code} > > We searched the log from all datanode, and found that the internal blocks of > blk_-9223372036824119008 were deleted almost at the same time. > > {code:java} > 08:15:58,550 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI > file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008 > 08:16:21,214 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI > file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006 > 08:16:55,737 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI > file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005 > {code} > > The total number of internal blocks deleted during 08:15-08:17 are as follows > ||internal block||index|| delete num|| > |blk_-9223372036824119008 > blk_-9223372036824119006 > blk_-9223372036824119005 > blk_-9223372036824119004 > blk_-9223372036824119003 > blk_-9223372036824119000 |0 > 2 > 3 > 4 > 5 > 8| 1 > 1 > 1 > 50 > 1 > 1| > > {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered > full block report immediately.{color} > > There are 2 questions: > 1. Why are there so many replicas of this block? > 2. Why delete the internal block with only one copy? > The reasons for the first problem may be as follows: > 1. We set the full block report period of some datanode to 168 hours. > 2. We have done a namenode HA operation. > 3. After namenode HA, the state of storage became > {color:#ff}stale{color}, and the state not change until next full block > report. > 4. The balancer copied the replica without deleting the replica from source > node, because the source node have the stale storage, and the request was put > into {color:#ff}postponedMisreplicatedBlocks{color}. > 5. Balancer continues to copy the replica, eventually resulting in multiple > copies of a replica > !image-2022-01-10-17-31-35-910.png|width=642,height=269! > The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many > block to remove. > !image-2022-01-10-17-32-56-981.png|width=745,height=124! > As for the second question, we checked the code of > {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any > problem. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks
[ https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16420: Fix Version/s: 3.3.2 (was: 3.3.3) > Avoid deleting unique data blocks when deleting redundancy striped blocks > - > > Key: HDFS-16420 > URL: https://issues.apache.org/jira/browse/HDFS-16420 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liubingxing >Assignee: Jackson Wang >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: image-2022-01-10-17-31-35-910.png, > image-2022-01-10-17-32-56-981.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We have a similar problem as HDFS-16297 described. > In our cluster, we used {color:#de350b}ec(6+3) + balancer with version > 3.1.0{color}, and the {color:#de350b}missing block{color} happened. > We got the block(blk_-9223372036824119008) info from fsck, only 5 live > replications and multiple redundant replications. > {code:java} > blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5 > blk_-9223372036824119007:DatanodeInfoWithStorage, > blk_-9223372036824119002:DatanodeInfoWithStorage, > blk_-9223372036824119001:DatanodeInfoWithStorage, > blk_-9223372036824119000:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage {code} > > We searched the log from all datanode, and found that the internal blocks of > blk_-9223372036824119008 were deleted almost at the same time. > > {code:java} > 08:15:58,550 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI > file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008 > 08:16:21,214 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI > file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006 > 08:16:55,737 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI > file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005 > {code} > > The total number of internal blocks deleted during 08:15-08:17 are as follows > ||internal block||index|| delete num|| > |blk_-9223372036824119008 > blk_-9223372036824119006 > blk_-9223372036824119005 > blk_-9223372036824119004 > blk_-9223372036824119003 > blk_-9223372036824119000 |0 > 2 > 3 > 4 > 5 > 8| 1 > 1 > 1 > 50 > 1 > 1| > > {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered > full block report immediately.{color} > > There are 2 questions: > 1. Why are there so many replicas of this block? > 2. Why delete the internal block with only one copy? > The reasons for the first problem may be as follows: > 1. We set the full block report period of some datanode to 168 hours. > 2. We have done a namenode HA operation. > 3. After namenode HA, the state of storage became > {color:#ff}stale{color}, and the state not change until next full block > report. > 4. The balancer copied the replica without deleting the replica from source > node, because the source node have the stale storage, and the request was put > into {color:#ff}postponedMisreplicatedBlocks{color}. > 5. Balancer continues to copy the replica, eventually resulting in multiple > copies of a replica > !image-2022-01-10-17-31-35-910.png|width=642,height=269! > The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many > block to remove. > !image-2022-01-10-17-32-56-981.png|width=745,height=124! > As for the second question, we checked the code of > {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any > problem. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16410) Insecure Xml parsing in OfflineEditsXmlLoader
[ https://issues.apache.org/jira/browse/HDFS-16410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-16410. - Fix Version/s: 3.4.0 3.3.2 Resolution: Fixed > Insecure Xml parsing in OfflineEditsXmlLoader > -- > > Key: HDFS-16410 > URL: https://issues.apache.org/jira/browse/HDFS-16410 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Minor > Labels: pull-request-available, security > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Insecure Xml parsing in OfflineEditsXmlLoader > [https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/OfflineEditsXmlLoader.java#L88] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero
[ https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16408: Fix Version/s: 3.3.2 (was: 3.3.3) > Ensure LeaseRecheckIntervalMs is greater than zero > -- > > Key: HDFS-16408 > URL: https://issues.apache.org/jira/browse/HDFS-16408 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.3, 3.3.1 >Reporter: Jingxuan Fu >Assignee: Jingxuan Fu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Original Estimate: 1h > Time Spent: 3h 20m > Remaining Estimate: 0h > > There is a problem with the try catch statement in the LeaseMonitor daemon > (in LeaseManager.java), when an unknown exception is caught, it simply prints > a warning message and continues with the next loop. > An extreme case is when the configuration item > 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative > number by the user, as the configuration item is read without checking its > range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is > used as an argument to Thread.sleep(). A negative argument will cause > Thread.sleep() to throw an IllegalArgumentException, which will be caught by > 'catch(Throwable e)' and a warning message will be printed. > This behavior is repeated for each subsequent loop. This means that a huge > amount of repetitive messages will be printed to the log file in a short > period of time, quickly consuming disk space and affecting the operation of > the system. > As you can see, 178M log files are generated in one minute. > > {code:java} > ll logs/ > total 174456 > drwxrwxr-x 2 hadoop hadoop 4096 1月 3 15:13 ./ > drwxr-xr-x 11 hadoop hadoop 4096 1月 3 15:13 ../ > -rw-rw-r-- 1 hadoop hadoop 36342 1月 3 15:14 > hadoop-hadoop-datanode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 1243 1月 3 15:13 > hadoop-hadoop-datanode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 178545466 1月 3 15:14 > hadoop-hadoop-namenode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 692 1月 3 15:13 > hadoop-hadoop-namenode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 33201 1月 3 15:14 > hadoop-hadoop-secondarynamenode-ljq1.log > -rw-rw-r-- 1 hadoop hadoop 3764 1月 3 15:14 > hadoop-hadoop-secondarynamenode-ljq1.out > -rw-rw-r-- 1 hadoop hadoop 0 1月 3 15:13 SecurityAuth-hadoop.audit > > tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log > 2022-01-03 15:14:46,032 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > 2022-01-03 15:14:46,033 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > 2022-01-03 15:14:46,033 WARN > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: > java.lang.IllegalArgumentException: timeout value is negative > at java.base/java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > > I think there are two potential solutions. > The first is to adjust the position of the try catch statement in the > LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop > body. This can be done like the NameNodeResourceMonitor daemon, which ends > the thread when an unexpected exception is caught. > The second is to use Precondition.checkArgument() to scope the configuration > item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the > wrong configuration item can affect the subsequent operation of the program. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16314) Support to make dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16314: Fix Version/s: 3.3.2 (was: 3.3.3) > Support to make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > - > > Key: HDFS-16314 > URL: https://issues.apache.org/jira/browse/HDFS-16314 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Consider that make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > and rapid rollback in case this feature HDFS-16076 unexpected things happen > in production environment -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16287) Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16287: Fix Version/s: 3.3.2 (was: 3.3.3) > Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable > - > > Key: HDFS-16287 > URL: https://issues.apache.org/jira/browse/HDFS-16287 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 11.5h > Remaining Estimate: 0h > > 1. Consider that make dfs.namenode.avoid.read.slow.datanode reconfigurable > and rapid rollback in case this feature > [HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076] unexpected > things happen in production environment > 2. DatanodeManager#startSlowPeerCollector by parameter > 'dfs.datanode.peer.stats.enabled' to control -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16268) Balancer stuck when moving striped blocks due to NPE
[ https://issues.apache.org/jira/browse/HDFS-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16268: Fix Version/s: 3.3.2 (was: 3.3.3) > Balancer stuck when moving striped blocks due to NPE > > > Key: HDFS-16268 > URL: https://issues.apache.org/jira/browse/HDFS-16268 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, erasure-coding >Affects Versions: 3.2.2 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 40m > Remaining Estimate: 0h > > {code:java} > 21/10/11 06:11:26 WARN balancer.Dispatcher: Dispatcher thread failed > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.markMovedIfGoodBlock(Dispatcher.java:289) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.chooseBlockAndProxy(Dispatcher.java:272) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:236) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.chooseNextMove(Dispatcher.java:899) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.dispatchBlocks(Dispatcher.java:958) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.access$3300(Dispatcher.java:757) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$2.run(Dispatcher.java:1226) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Due to NPE in the middle, there will be pending moves left in the queue so > balancer will stuck forever. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16293: Fix Version/s: 3.3.2 (was: 3.3.3) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15862) Make TestViewfsWithNfs3.testNfsRenameSingleNN() idempotent
[ https://issues.apache.org/jira/browse/HDFS-15862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15862: Fix Version/s: 3.3.2 (was: 3.3.3) > Make TestViewfsWithNfs3.testNfsRenameSingleNN() idempotent > -- > > Key: HDFS-15862 > URL: https://issues.apache.org/jira/browse/HDFS-15862 > Project: Hadoop HDFS > Issue Type: Test > Components: nfs >Reporter: Zhengxi Li >Assignee: Zhengxi Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: HDFS-15862.001.patch, HDFS-15862.002.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The > 'org.apache.hadoop.hdfs.nfs.nfs3.TestViewfsWithNfs3.testNfsRenameSingleNN' > test is not idempotent and fails if run twice in the same JVM, because it > pollutes state shared among tests. It may be good to clean this state > pollution so that some other tests do not fail in the future due to the > shared state polluted by this test. > Running {{TestViewfsWithNfs3.testNfsRenameSingleNN}} twice would result in > the second run failing with a NullPointer exception: > {noformat} > [ERROR] Errors: > [ERROR] TestViewfsWithNfs3.testNfsRenameSingleNN:317 NullPointer > {noformat} > The reason for this is that the {{/user1/renameSingleNN}} file is created in > {{setup()}}, but gets renamed in {{testNfsRenameSingl{{eNN. When the > second run of {{testNfsRenameSingleNN}} tries to get info of the file by its > original name, it returns a NullPointer since the file no longer exists. > > Link to PR: https://github.com/apache/hadoop/pull/2724 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16345) Fix test cases fail in TestBlockStoragePolicy
[ https://issues.apache.org/jira/browse/HDFS-16345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16345: Fix Version/s: 3.3.2 (was: 3.3.3) > Fix test cases fail in TestBlockStoragePolicy > - > > Key: HDFS-16345 > URL: https://issues.apache.org/jira/browse/HDFS-16345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.1 >Reporter: guophilipse >Assignee: guophilipse >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > test class ``TestBlockStoragePolicy` ` fail frequently for the > `BindException`, it block all normal source code build. we can improve it. > [ERROR] Tests run: 26, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 49.295 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestBlockStoragePolicy > [ERROR] > testChooseTargetWithTopology(org.apache.hadoop.hdfs.TestBlockStoragePolicy) > Time elapsed: 0.551 s <<< ERROR! java.net.BindException: Problem binding to > [localhost:43947] java.net.BindException: Address already in use; For more > details see: http://wiki.apache.org/hadoop/BindException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931) at > org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:827) at > org.apache.hadoop.ipc.Server.bind(Server.java:657) at > org.apache.hadoop.ipc.Server$Listener.(Server.java:1352) at > org.apache.hadoop.ipc.Server.(Server.java:3252) at > org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:468) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853) at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:466) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:860) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:766) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1017) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:992) > at > org.apache.hadoop.hdfs.TestBlockStoragePolicy.testChooseTargetWithTopology(TestBlockStoragePolicy.java:1275) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at >
[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block
[ https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16333: Fix Version/s: 3.3.2 (was: 3.3.3) > fix balancer bug when transfer an EC block > -- > > Key: HDFS-16333 > URL: https://issues.apache.org/jira/browse/HDFS-16333 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: image-2021-11-18-17-25-13-089.png, > image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png > > Time Spent: 6h 40m > Remaining Estimate: 0h > > We set the EC policy to (6+3) and we also have nodes that were > decommissioning when we executed balancer. > With the balancer running, we find many error logs as follow. > !image-2021-11-18-17-25-13-089.png|width=858,height=135! > Node A wants to transfer an EC block to node B, but we found that the block > is not on node A. The FSCK command to show the block status as follow > !image-2021-11-18-17-25-50-556.png|width=607,height=189! > In the dispatcher. getBlockList function > !image-2021-11-18-17-28-03-155.png! > > Assume that the location of the an EC block in storageGroupMap look like this > indices:[0, 1, 2, 3, 4, 5, 6, 7, 8] > node:[a, b, c, d, e, f, g, h, i] > after decommission operation, the internal block on indices[1] were > decommission to another node. > indices:[0, 1, 2, 3, 4, 5, 6, 7, 8] > node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i] > the location of indices[1] change from node {color:#FF}b{color} to node > {color:#FF}j{color}. > > When the balancer get the block location and check it with the location in > storageGroupMap. > If a node is not found in storageGroupMap, it will not be add to block > locations. > In this case, node {color:#FF}j {color}will not be added to the block > locations, while the indices is not updated. > Finally, the block location may look like this, > indices:[0, 1, 2, 3, 4, 5, 6, 7, 8] > {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color} > the location of the nodes does not match their indices > > Solution: > we should update the indices and match with the nodes > {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color} > {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16375) The FBR lease ID should be exposed to the log
[ https://issues.apache.org/jira/browse/HDFS-16375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16375: Fix Version/s: 3.3.2 (was: 3.3.3) > The FBR lease ID should be exposed to the log > - > > Key: HDFS-16375 > URL: https://issues.apache.org/jira/browse/HDFS-16375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Our Hadoop version is 3.1.0. We encountered HDFS-12914 and HDFS-14314 in the > production environment. > When locating the problem, the *fullBrLeaseId* was not exposed in the log, > which caused some difficulties. We should expose it to the log. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16373) Fix MiniDFSCluster restart in case of multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-16373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16373: Fix Version/s: 3.3.2 (was: 3.3.3) > Fix MiniDFSCluster restart in case of multiple namenodes > > > Key: HDFS-16373 > URL: https://issues.apache.org/jira/browse/HDFS-16373 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > In case of multiple namenodes, if more than one namenode are restarted, it > fails. Since the restartNamenode checks for all the namenodes to get up, But > if 2 namenodes are down, and we restart one, the other namenode won't be up, > so restart fails. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16327) Make dfs.namenode.max.slowpeer.collect.nodes reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16327: Fix Version/s: 3.3.2 (was: 3.3.3) > Make dfs.namenode.max.slowpeer.collect.nodes reconfigurable > --- > > Key: HDFS-16327 > URL: https://issues.apache.org/jira/browse/HDFS-16327 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > As the HDFS cluster expands or shrinks, the number of slow nodes to be > filtered must be dynamically adjusted. So we should make > DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY reconfigurable. > See HDFS-15879. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16377) Should CheckNotNull before access FsDatasetSpi
[ https://issues.apache.org/jira/browse/HDFS-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16377: Fix Version/s: 3.3.2 (was: 3.3.3) > Should CheckNotNull before access FsDatasetSpi > -- > > Key: HDFS-16377 > URL: https://issues.apache.org/jira/browse/HDFS-16377 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: image-2021-12-10-19-19-22-957.png, > image-2021-12-10-19-20-58-022.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When starting the DN, we found NPE in the staring DN's log, as follows: > !image-2021-12-10-19-19-22-957.png|width=909,height=126! > The logs of the upstream DN are as follows: > !image-2021-12-10-19-20-58-022.png|width=905,height=239! > This is mainly because *FsDatasetSpi* has not been initialized at the time of > access. > I noticed that checkNotNull is already done in these two > method({*}DataNode#getBlockLocalPathInfo{*} and > {*}DataNode#getVolumeInfo{*}). So we should add it to other places(interfaces > that clients and other DN can access directly) so that we can add a message > when throwing exceptions. > Therefore, the client and the upstream DN know that FsDatasetSpi has not been > initialized, rather than blindly unaware of the specific cause of the NPE. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16391: Fix Version/s: 3.3.2 (was: 3.3.3) > Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService > --- > > Key: HDFS-16391 > URL: https://issues.apache.org/jira/browse/HDFS-16391 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16386: Fix Version/s: 3.3.2 (was: 3.3.3) > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2, 3.2.4 > > Attachments: monitor.png > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16395) Remove useless NNThroughputBenchmark#dummyActionNoSynch()
[ https://issues.apache.org/jira/browse/HDFS-16395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16395: Fix Version/s: 3.3.2 (was: 3.3.3) > Remove useless NNThroughputBenchmark#dummyActionNoSynch() > - > > Key: HDFS-16395 > URL: https://issues.apache.org/jira/browse/HDFS-16395 > Project: Hadoop HDFS > Issue Type: Bug > Components: benchmarks, namenode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > > Doesn't seem to be used anywhere NNThroughputBenchmark#dummyActionNoSynch(), > It is recommended to delete it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-14099: Fix Version/s: 3.3.2 (was: 3.3.3) > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); >
[jira] [Updated] (HDFS-16409) Fix typo: testHasExeceptionsReturnsCorrectValue -> testHasExceptionsReturnsCorrectValue
[ https://issues.apache.org/jira/browse/HDFS-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16409: Fix Version/s: 3.3.2 (was: 3.3.3) > Fix typo: testHasExeceptionsReturnsCorrectValue -> > testHasExceptionsReturnsCorrectValue > --- > > Key: HDFS-16409 > URL: https://issues.apache.org/jira/browse/HDFS-16409 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ashutosh Gupta >Assignee: Ashutosh Gupta >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Fixing typo testHasExeceptionsReturnsCorrectValue to > testHasExceptionsReturnsCorrectValue in > {code:java} > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestAddBlockPoolException.java{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16343) Add some debug logs when the dfsUsed are not used during Datanode startup
[ https://issues.apache.org/jira/browse/HDFS-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16343: Fix Version/s: 3.3.2 (was: 3.3.3) > Add some debug logs when the dfsUsed are not used during Datanode startup > - > > Key: HDFS-16343 > URL: https://issues.apache.org/jira/browse/HDFS-16343 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16337) Show start time of Datanode on Web
[ https://issues.apache.org/jira/browse/HDFS-16337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16337: Fix Version/s: 3.3.2 (was: 3.3.3) > Show start time of Datanode on Web > -- > > Key: HDFS-16337 > URL: https://issues.apache.org/jira/browse/HDFS-16337 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: image-2021-11-19-08-55-58-343.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Show _start time_ of Datanode on Web. > !image-2021-11-19-08-55-58-343.png|width=540,height=155! > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16335) Fix HDFSCommands.md
[ https://issues.apache.org/jira/browse/HDFS-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16335: Fix Version/s: 3.3.2 (was: 3.3.3) > Fix HDFSCommands.md > --- > > Key: HDFS-16335 > URL: https://issues.apache.org/jira/browse/HDFS-16335 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Fix HDFSCommands.md. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16334) Correct NameNode ACL description
[ https://issues.apache.org/jira/browse/HDFS-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16334: Fix Version/s: 3.3.2 (was: 3.3.3) > Correct NameNode ACL description > > > Key: HDFS-16334 > URL: https://issues.apache.org/jira/browse/HDFS-16334 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1 >Reporter: guophilipse >Assignee: guophilipse >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > `dfs.namenode.acls.enabled` is set to be `true` by default after HDFS-13505 > ,we can improve the desc -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16328) Correct disk balancer param desc
[ https://issues.apache.org/jira/browse/HDFS-16328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16328: Fix Version/s: 3.3.2 (was: 3.3.3) > Correct disk balancer param desc > > > Key: HDFS-16328 > URL: https://issues.apache.org/jira/browse/HDFS-16328 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, hdfs >Affects Versions: 3.3.1 >Reporter: guophilipse >Assignee: guophilipse >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > > `dfs.disk.balancer.enabled` is enabled by default after HDFS-13153, we can > improve the doc to avoid confusion -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer
[ https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16330: Fix Version/s: 3.3.2 (was: 3.3.3) > Fix incorrect placeholder for Exception logs in DiskBalancer > > > Key: HDFS-16330 > URL: https://issues.apache.org/jira/browse/HDFS-16330 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16326) Simplify the code for DiskBalancer
[ https://issues.apache.org/jira/browse/HDFS-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16326: Fix Version/s: 3.3.2 (was: 3.3.3) > Simplify the code for DiskBalancer > -- > > Key: HDFS-16326 > URL: https://issues.apache.org/jira/browse/HDFS-16326 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > > Simplify the code for DiskBalancer. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16329) Fix log format for BlockManager
[ https://issues.apache.org/jira/browse/HDFS-16329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16329: Fix Version/s: 3.3.2 (was: 3.3.3) > Fix log format for BlockManager > --- > > Key: HDFS-16329 > URL: https://issues.apache.org/jira/browse/HDFS-16329 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > > Fix log format for BlockManager. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers
[ https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16323: Fix Version/s: 3.3.2 (was: 3.3.3) > DatanodeHttpServer doesn't require handler state map while retrieving filter > handlers > - > > Key: HDFS-16323 > URL: https://issues.apache.org/jira/browse/HDFS-16323 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > DatanodeHttpServer#getFilterHandlers use handler state map just to query if > the given datanode httpserver filter handler class exists in the map and if > not, initialize the Channel handler by invoking specific parameterized > constructor of the class. However, this handler state map is never used to > upsert any data. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode
[ https://issues.apache.org/jira/browse/HDFS-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16315: Fix Version/s: 3.3.2 (was: 3.3.3) > Add metrics related to Transfer and NativeCopy for DataNode > --- > > Key: HDFS-16315 > URL: https://issues.apache.org/jira/browse/HDFS-16315 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: image-2021-11-11-08-26-33-074.png > > Time Spent: 5h > Remaining Estimate: 0h > > Datanodes already have Read, Write, Sync and Flush metrics. We should add > NativeCopy and Transfer as well. > Here is a partial look after the change: > !image-2021-11-11-08-26-33-074.png|width=205,height=235! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1
[ https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15964: Target Version/s: 3.4.0, 3.3.3 (was: 3.4.0, 3.3.2) > Please update the okhttp version to 4.9.1 > - > > Key: HDFS-15964 > URL: https://issues.apache.org/jira/browse/HDFS-15964 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, dfsclient, security >Affects Versions: 3.3.0 >Reporter: helen huang >Priority: Major > > Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan > flagged two issues with this version. Please update it to the latest (It is > okhttp3 4.9.1 at this point). Thanks! > > com.squareup.okhttp3 > okhttp > 4.9.1 > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15982) Deleted data using HTTP API should be saved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15982: Target Version/s: 3.4.0, 3.3.3 (was: 3.4.0, 3.3.2) > Deleted data using HTTP API should be saved to the trash > > > Key: HDFS-15982 > URL: https://issues.apache.org/jira/browse/HDFS-15982 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs, hdfs-client, httpfs, webhdfs >Reporter: Bhavik Patel >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot > 2021-04-23 at 4.36.57 PM.png > > Time Spent: 13h 20m > Remaining Estimate: 0h > > If we delete the data from the Web UI then it should be first moved to > configured/default Trash directory and after the trash interval time, it > should be removed. currently, data directly removed from the system[This > behavior should be the same as CLI cmd] > This can be helpful when the user accidentally deletes data from the Web UI. > Similarly we should provide "Skip Trash" option in HTTP API as well which > should be accessible through Web UI. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage
[ https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15715: Target Version/s: 3.3.3 (was: 3.3.2) > ReplicatorMonitor performance degrades, when the storagePolicy of many file > are not match with their real datanodestorage > -- > > Key: HDFS-15715 > URL: https://issues.apache.org/jira/browse/HDFS-15715 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.3, 3.2.1 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, > HDFS-15715.002.patch.addendum, image-2021-03-26-12-17-45-500.png > > > One of our Namenode which has 300M files and blocks. In common way, this > namode shoud not be in heavy load. But we found rpc process time keep high, > and decommission is very slow. > > I search the metrics, I found uderreplicated blocks keep high. Then I jstack > namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe > chooseTarget can't find block, so result to performance degradation. Consider > with HDFS-10453, I guess maybe some logical trigger to the scene where > chooseTarget can't find proper block. > Then I enable some debug. (Of course I revise some code so that only debug > isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). > I found "the rack has too many chosen nodes" is called. Then I found some log > like this > {code} > 2020-12-04 12:13:56,345 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], > creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For > more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2020-12-04 12:14:03,843 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], > storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], > creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more > information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > {code} > Then through some debug and simulation, I found the reason, and reproduction > this exception. > The reason is that some developer use COLD storage policy and mover, but the > operatiosn of setting storage policy and mover are asynchronous. So some > file's real datanodestorages are not match with this storagePolicy. > Let me simualte this proccess. If /tmp/a is create, then have 2 replications > are DISK. Then set storage policy to COLD. When some logical trigger(For > example decommission) to copy this block. chooseTarget then use > chooseStorageTypes to filter real needed block. Here the size of variable > requiredStorageTypes which chooseStorageTypes returned is 3. But the size of > result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK > storage. Then will request to choose 3 target. choose first target is right, > but when choose seconde target, the variable 'counter' is 4 which is larger > than maxTargetPerRack which is 3 in function isGoodTarget. So skip all > datanodestorage. Then result to bad performance. > I think chooseStorageTypes need to consider the result, when the exist > replication doesn't meet storage policy's demand, we need to remove this from > result. > I changed by this way, and test in my unit-test. Then solve it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15505) Fix NullPointerException when call getAdditionalDatanode method with null extendedBlock parameter
[ https://issues.apache.org/jira/browse/HDFS-15505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15505: Target Version/s: 3.4.0, 3.3.3 (was: 3.4.0, 3.3.2) > Fix NullPointerException when call getAdditionalDatanode method with null > extendedBlock parameter > - > > Key: HDFS-15505 > URL: https://issues.apache.org/jira/browse/HDFS-15505 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2, 3.2.0, 3.1.1, 3.0.3, 3.1.2, > 3.3.0, 3.2.1, 3.1.3 >Reporter: hang chen >Priority: Major > > When client call getAdditionalDatanode method, it will initialize > GetAdditionalDatanodeRequestProto and send RPC request to Router/namenode. > However, if we call getAdditionalDatanode method with null extendedBlock > parameter, it will set GetAdditionalDatanodeRequestProto's blk field with > null, which will cause NullPointerException. The code show as follow. > {code:java} > // code placeholder > GetAdditionalDatanodeRequestProto req = GetAdditionalDatanodeRequestProto > .newBuilder() > .setSrc(src) > .setFileId(fileId) > .setBlk(PBHelperClient.convert(blk)) > .addAllExistings(PBHelperClient.convert(existings)) > .addAllExistingStorageUuids(Arrays.asList(existingStorageIDs)) > .addAllExcludes(PBHelperClient.convert(excludes)) > .setNumAdditionalNodes(numAdditionalNodes) > .setClientName(clientName) > .build();{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
[ https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15289: Target Version/s: 3.4.0, 3.2.4, 3.3.3 (was: 3.4.0, 3.3.2, 3.2.4) > Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table > - > > Key: HDFS-15289 > URL: https://issues.apache.org/jira/browse/HDFS-15289 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Affects Versions: 3.2.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png > > > ViewFS provides flexibility to mount different filesystem types with mount > points configuration table. This approach is solving the scalability > problems, but users need to reconfigure the filesystem to ViewFS and to its > scheme. This will be problematic in the case of paths persisted in meta > stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, > changing the file system scheme will create a burden to upgrade/recreate meta > stores. In our experience many users are not ready to change that. > Router based federation is another implementation to provide coordinated > mount points for HDFS federation clusters. Even though this provides > flexibility to handle mount points easily, this will not allow > other(non-HDFS) file systems to mount. So, this does not solve the purpose > when users want to mount external(non-HDFS) filesystems. > So, the problem here is: Even though many users want to adapt to the scalable > fs options available, technical challenges of changing schemes (ex: in meta > stores) in deployments are obstructing them. > So, we propose to allow hdfs scheme in ViewFS like client side mount system > and provision user to create mount links without changing URI paths. > I will upload detailed design doc shortly. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15965) Please upgrade the log4j dependency to log4j2
[ https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15965: Target Version/s: 3.4.0, 3.3.3 (was: 3.4.0, 3.3.2) > Please upgrade the log4j dependency to log4j2 > - > > Key: HDFS-15965 > URL: https://issues.apache.org/jira/browse/HDFS-15965 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient >Affects Versions: 3.3.0, 3.2.1, 3.2.2 >Reporter: helen huang >Priority: Major > > The log4j dependency being use by hadoop-common is currently version 1.2.17. > Our fortify scan picked up a couple of issue with this dependency. Please > update it to the latest version of log4j2 dependencies: > > org.apache.logging.log4j > log4j-api > 2.14.1 > > > org.apache.logging.log4j > log4j-core > 2.14.1 > > > The slf4j dependency will need to be updated as well after you upgrade log4j > to log4j2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HDFS-13522: --- Assignee: (was: Chao Sun) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15754) Create packet metrics for DataNode
[ https://issues.apache.org/jira/browse/HDFS-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-15754. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Create packet metrics for DataNode > -- > > Key: HDFS-15754 > URL: https://issues.apache.org/jira/browse/HDFS-15754 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In BlockReceiver, right now when there is slowness in writeToMirror, > writeToDisk and writeToOsCache, it is dumped in the debug log. In practice we > have found these are quite useful signal to detect issues in DataNode, so it > will be great these metrics can be exposed by JMX. > Also we introduced totalPacket received to use a percentage as a signal to > detect the potentially underperforming datanode since datanodes across one > HDFS cluster may received different numbers of packets totally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257429#comment-17257429 ] Chao Sun commented on HDFS-15751: - +1 on patch v3. Thanks again [~hexiaoqiao] and [~shv]! > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch, > HDFS-15751-03.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257335#comment-17257335 ] Chao Sun commented on HDFS-15751: - Thanks [~hexiaoqiao]! IMO the preconditions section should be specific to what conditions should be met prior to calling the method, and we should move the sentence: {quote} It is currently only implemented for HDFS and others will just throw UnsupportedOperationException. {quote} to the previous section before preconditions (also following other methods such as concat). > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper
[ https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257097#comment-17257097 ] Chao Sun commented on HDFS-15756: - cc [~fengnanli] > RBF: Cannot get updated delegation token from zookeeper > --- > > Key: HDFS-15756 > URL: https://issues.apache.org/jira/browse/HDFS-15756 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.0.0 >Reporter: hbprotoss >Priority: Major > > Affected version: all version with rbf > When RBF work with spark 2.4 client mode, there will be a chance that token > is missing across different nodes in RBF cluster. The root cause is that > spark renew the token(via resource manager) immediately after got one, as > zookeeper don't have a strong consistency guarantee after an update in > cluster, zookeeper client may read a stale value in some followers not synced > with other nodes. > > We apply a patch in spark, but it is still the problem of RBF. Is it possible > for RBF to replace the delegation token store using some other > datasource(redis for example)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15756) [RBF] Cannot get updated delegation token from zookeeper
[ https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15756: Summary: [RBF] Cannot get updated delegation token from zookeeper (was: [RBF]Cannot get updated delegation token from zookeeper) > [RBF] Cannot get updated delegation token from zookeeper > > > Key: HDFS-15756 > URL: https://issues.apache.org/jira/browse/HDFS-15756 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.0.0 >Reporter: hbprotoss >Priority: Major > > Affected version: all version with rbf > When RBF work with spark 2.4 client mode, there will be a chance that token > is missing across different nodes in RBF cluster. The root cause is that > spark renew the token(via resource manager) immediately after got one, as > zookeeper don't have a strong consistency guarantee after an update in > cluster, zookeeper client may read a stale value in some followers not synced > with other nodes. > > We apply a patch in spark, but it is still the problem of RBF. Is it possible > for RBF to replace the delegation token store using some other > datasource(redis for example)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md
[ https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256656#comment-17256656 ] Chao Sun commented on HDFS-15751: - I agree with [~hexiaoqiao] and think we can mention that this is currently only implemented for HDFS and others will just throw {{UnsupportedOperationException}}, similar to what we're doing for {{concat}}, {{truncate}} etc. Otherwise I'm +1 on this. Thanks. > Add documentation for msync() API to filesystem.md > -- > > Key: HDFS-15751 > URL: https://issues.apache.org/jira/browse/HDFS-15751 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15751-01.patch > > > HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to > the API definitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency
[ https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15690: Fix Version/s: 3.3.1 > Add lz4-java as hadoop-hdfs test dependency > --- > > Key: HDFS-15690 > URL: https://issues.apache.org/jira/browse/HDFS-15690 > Project: Hadoop HDFS > Issue Type: Test >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: > net/jpountz/lz4/LZ4Factory": > https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/ > We need to add lz4-java to hadoop-hdfs test dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15708) TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and branch-3.2
[ https://issues.apache.org/jira/browse/HDFS-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-15708. - Fix Version/s: 3.3.1 3.2.2 Hadoop Flags: Reviewed Resolution: Fixed > TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and > branch-3.2 > --- > > Key: HDFS-15708 > URL: https://issues.apache.org/jira/browse/HDFS-15708 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Chao Sun >Priority: Blocker > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1 > > Time Spent: 50m > Remaining Estimate: 0h > > TestURLConnectionFactory#testSSLFactoryCleanup fails: > {noformat} > [ERROR] > testSSLFactoryCleanup(org.apache.hadoop.hdfs.web.TestURLConnectionFactory) > Time elapsed: 0.28 s <<< ERROR! > java.lang.NoClassDefFoundError: > org/bouncycastle/x509/X509V1CertificateGenerator > at > org.apache.hadoop.security.ssl.KeyStoreTestUtil.generateCertificate(KeyStoreTestUtil.java:86) > at > org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:273) > at > org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:228) > at > org.apache.hadoop.hdfs.web.TestURLConnectionFactory.testSSLFactoryCleanup(TestURLConnectionFactory.java:83) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.lang.ClassNotFoundException: > org.bouncycastle.x509.X509V1CertificateGenerator > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 29 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15708) TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and branch-3.2
[ https://issues.apache.org/jira/browse/HDFS-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HDFS-15708: --- Assignee: Chao Sun > TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and > branch-3.2 > --- > > Key: HDFS-15708 > URL: https://issues.apache.org/jira/browse/HDFS-15708 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > TestURLConnectionFactory#testSSLFactoryCleanup fails: > {noformat} > [ERROR] > testSSLFactoryCleanup(org.apache.hadoop.hdfs.web.TestURLConnectionFactory) > Time elapsed: 0.28 s <<< ERROR! > java.lang.NoClassDefFoundError: > org/bouncycastle/x509/X509V1CertificateGenerator > at > org.apache.hadoop.security.ssl.KeyStoreTestUtil.generateCertificate(KeyStoreTestUtil.java:86) > at > org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:273) > at > org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:228) > at > org.apache.hadoop.hdfs.web.TestURLConnectionFactory.testSSLFactoryCleanup(TestURLConnectionFactory.java:83) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.lang.ClassNotFoundException: > org.bouncycastle.x509.X509V1CertificateGenerator > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 29 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency
[ https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236836#comment-17236836 ] Chao Sun commented on HDFS-15690: - [~ayushtkn], [~ste...@apache.org] can you help to add [~viirya] to HDFS contributor list? I can't assign this JIRA to him. Thanks. > Add lz4-java as hadoop-hdfs test dependency > --- > > Key: HDFS-15690 > URL: https://issues.apache.org/jira/browse/HDFS-15690 > Project: Hadoop HDFS > Issue Type: Test >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: > net/jpountz/lz4/LZ4Factory": > https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/ > We need to add lz4-java to hadoop-hdfs test dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency
[ https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-15690. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk. I'll backport this to 3.3 branch together with HADOOP-17292 later. > Add lz4-java as hadoop-hdfs test dependency > --- > > Key: HDFS-15690 > URL: https://issues.apache.org/jira/browse/HDFS-15690 > Project: Hadoop HDFS > Issue Type: Test >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: > net/jpountz/lz4/LZ4Factory": > https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/ > We need to add lz4-java to hadoop-hdfs test dependency. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231721#comment-17231721 ] Chao Sun commented on HDFS-15467: - [~aihuaxu] Yes. ObserverReadProxyProvider does have its own retry logic but only for contacting observer namenodes. For contacting active NN such as {{msync}} or write requests, it still rely on the upper-level retry logic. > ObserverReadProxyProvider should skip logging first failover from each proxy > > > Key: HDFS-15467 > URL: https://issues.apache.org/jira/browse/HDFS-15467 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Hanisha Koneru >Assignee: Aihua Xu >Priority: Major > > After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first > failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses > {{combinedProxy}} object which combines all proxies into one and assigns > {{combinedInfo}} as the ProxyInfo. > {noformat} > ObserverReadProxyProvider# Lines 197-207: > for (int i = 0; i < nameNodeProxies.size(); i++) { > if (i > 0) { > combinedInfo.append(","); > } > combinedInfo.append(nameNodeProxies.get(i).proxyInfo); > } > combinedInfo.append(']'); > T wrappedProxy = (T) Proxy.newProxyInstance( > ObserverReadInvocationHandler.class.getClassLoader(), > new Class[] {xface}, new ObserverReadInvocationHandler()); > combinedProxy = new ProxyInfo<>(wrappedProxy, > combinedInfo.toString()){noformat} > {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate > between proxies while checking if failover from that proxy happened before. > And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on > {{ObserverReadProxyProvider.}}It would need to handled separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15469) Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE
[ https://issues.apache.org/jira/browse/HDFS-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15469: Resolution: Fixed Status: Resolved (was: Patch Available) > Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE > > > Key: HDFS-15469 > URL: https://issues.apache.org/jira/browse/HDFS-15469 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.3 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15469.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Now the value of PacketReceiver#MAX_PACKET_SIZE is fixed and the size is 16M. > This value should be configurable to facilitate better performance in > different environments. For example, when the network environment is poor, or > the machine quality is not good, and the hard disk quality is not good, this > value should be set below 16M, such as 8M, which will be more conducive to > the stability of the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15469) Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE
[ https://issues.apache.org/jira/browse/HDFS-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15469: Fix Version/s: 3.4.0 > Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE > > > Key: HDFS-15469 > URL: https://issues.apache.org/jira/browse/HDFS-15469 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.3 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15469.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Now the value of PacketReceiver#MAX_PACKET_SIZE is fixed and the size is 16M. > This value should be configurable to facilitate better performance in > different environments. For example, when the network environment is poor, or > the machine quality is not good, and the hard disk quality is not good, this > value should be set below 16M, such as 8M, which will be more conducive to > the stability of the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy
[ https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229596#comment-17229596 ] Chao Sun commented on HDFS-15467: - [~aihuaxu] yes {{msync}} relies on the upper-level retry logic. It won't fail though - instead I think it will retry using the defined retry policies. What issue you are seeing with this? > ObserverReadProxyProvider should skip logging first failover from each proxy > > > Key: HDFS-15467 > URL: https://issues.apache.org/jira/browse/HDFS-15467 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Hanisha Koneru >Assignee: Aihua Xu >Priority: Major > > After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first > failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses > {{combinedProxy}} object which combines all proxies into one and assigns > {{combinedInfo}} as the ProxyInfo. > {noformat} > ObserverReadProxyProvider# Lines 197-207: > for (int i = 0; i < nameNodeProxies.size(); i++) { > if (i > 0) { > combinedInfo.append(","); > } > combinedInfo.append(nameNodeProxies.get(i).proxyInfo); > } > combinedInfo.append(']'); > T wrappedProxy = (T) Proxy.newProxyInstance( > ObserverReadInvocationHandler.class.getClassLoader(), > new Class[] {xface}, new ObserverReadInvocationHandler()); > combinedProxy = new ProxyInfo<>(wrappedProxy, > combinedInfo.toString()){noformat} > {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate > between proxies while checking if failover from that proxy happened before. > And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on > {{ObserverReadProxyProvider.}}It would need to handled separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15664) Prevent Observer NameNode from becoming StandBy NameNode
[ https://issues.apache.org/jira/browse/HDFS-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224933#comment-17224933 ] Chao Sun commented on HDFS-15664: - [~aihuaxu] seems this is already fixed by HDFS-14961? > Prevent Observer NameNode from becoming StandBy NameNode > > > Key: HDFS-15664 > URL: https://issues.apache.org/jira/browse/HDFS-15664 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: auto-failover >Affects Versions: 2.10.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > When the cluster performs a failover from NN1 to NN2, NN2 is asking all the > other NNs to cede active state and transit to StandBy including the Observer > NameNodes. > Seems we should block Observer from becoming StandBy and participating in > Failover. Of course, since we can transit StandBy NameNode to Observer, we > can separately support promote Observer NameNode to StandBy NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15601) Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature
Chao Sun created HDFS-15601: --- Summary: Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature Key: HDFS-15601 URL: https://issues.apache.org/jira/browse/HDFS-15601 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Chao Sun HDFS-13616 requires both server and client side change. However, it is common that users use a newer client to talk to older HDFS (say 2.10). Currently the client will simply fail in this scenario. A better approach, perhaps, is to have client fallback to use non-batched listing on the input directories. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15516) Add info for create flags in NameNode audit logs
[ https://issues.apache.org/jira/browse/HDFS-15516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196410#comment-17196410 ] Chao Sun commented on HDFS-15516: - I think this makes sense given that we already record flags for rename? this may break existing parsers who assume there is no flag for create/append op but not sure it should be a reason to block this though. > Add info for create flags in NameNode audit logs > > > Key: HDFS-15516 > URL: https://issues.apache.org/jira/browse/HDFS-15516 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Shashikant Banerjee >Assignee: jianghua zhu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15516.001.patch, HDFS-15516.002.patch, > HDFS-15516.003.patch, HDFS-15516.004.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Currently, if file create happens with flags like overwrite , the audit logs > doesn't seem to contain the info regarding the flags in the audit logs. It > would be useful to add info regarding the create options in the audit logs > similar to Rename ops. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190483#comment-17190483 ] Chao Sun commented on HDFS-13522: - [~hemanthboyina] feel free to take over this. I haven't got a chance to work on this but I think it is an important feature. I may be able to help on code review. > Support observer node from Router-Based Federation > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13522.001.patch, HDFS-13522_WIP.patch, RBF_ > Observer support.pdf, Router+Observer RPC clogging.png, > ShortTerm-Routers+Observer.png > > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15502) Implement service-user feature in DecayRPCScheduler
[ https://issues.apache.org/jira/browse/HDFS-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167747#comment-17167747 ] Chao Sun commented on HDFS-15502: - [~tasanuma] seems this JIRA is very similar to HADOOP-15016? this also should be a HADOOP jira rather than HDFS. > Implement service-user feature in DecayRPCScheduler > --- > > Key: HDFS-15502 > URL: https://issues.apache.org/jira/browse/HDFS-15502 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > > In our cluster, we want to use FairCallQueue to limit heavy users, but not > want to restrict certain users who are submitting important requests. This > jira proposes to implement the service-user feature that the user is always > scheduled high-priority queue. > According to HADOOP-9640, the initial concept of FCQ has this feature, but > not implemented finally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport
[ https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167374#comment-17167374 ] Chao Sun commented on HDFS-15014: - Thanks [~fengnanli]. Closing this as duplicate. > RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport > - > > Key: HDFS-15014 > URL: https://issues.apache.org/jira/browse/HDFS-15014 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Chao Sun >Priority: Major > > Currently the {{chooseDatanode}} call (which is shared by {{open}}, > {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls > {{getDatanodeReport}} from ALL downstream namenodes: > {code} > private DatanodeInfo chooseDatanode(final Router router, > final String path, final HttpOpParam.Op op, final long openOffset, > final String excludeDatanodes) throws IOException { > // We need to get the DNs as a privileged user > final RouterRpcServer rpcServer = getRPCServer(router); > UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); > RouterRpcServer.setCurrentUser(loginUser); > DatanodeInfo[] dns = null; > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > ... > {code} > The {{getDatanodeReport}} is very expensive (particularly in a large cluster) > as it need to lock the {{DatanodeManager}} which is also shared by calls such > as processing heartbeats. Check HDFS-14366 for a similar issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport
[ https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-15014. - Resolution: Duplicate > RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport > - > > Key: HDFS-15014 > URL: https://issues.apache.org/jira/browse/HDFS-15014 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Chao Sun >Priority: Major > > Currently the {{chooseDatanode}} call (which is shared by {{open}}, > {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls > {{getDatanodeReport}} from ALL downstream namenodes: > {code} > private DatanodeInfo chooseDatanode(final Router router, > final String path, final HttpOpParam.Op op, final long openOffset, > final String excludeDatanodes) throws IOException { > // We need to get the DNs as a privileged user > final RouterRpcServer rpcServer = getRPCServer(router); > UserGroupInformation loginUser = UserGroupInformation.getLoginUser(); > RouterRpcServer.setCurrentUser(loginUser); > DatanodeInfo[] dns = null; > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet excludes = new HashSet(); > if (excludeDatanodes != null) { > Collection collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > ... > {code} > The {{getDatanodeReport}} is very expensive (particularly in a large cluster) > as it need to lock the {{DatanodeManager}} which is also shared by calls such > as processing heartbeats. Check HDFS-14366 for a similar issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15465) Support WebHDFS accesses to the data stored in secure Datanode through insecure Namenode
[ https://issues.apache.org/jira/browse/HDFS-15465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-15465. - Fix Version/s: 3.4.0 Resolution: Fixed > Support WebHDFS accesses to the data stored in secure Datanode through > insecure Namenode > > > Key: HDFS-15465 > URL: https://issues.apache.org/jira/browse/HDFS-15465 > Project: Hadoop HDFS > Issue Type: Wish > Components: federation, webhdfs >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Minor > Fix For: 3.4.0 > > Attachments: webhdfs-federation.pdf > > > We're federating a secure HDFS cluster with an insecure cluster. > Using HDFS RPC, we can access the data managed by insecure Namenode and > stored in secure Datanode. > However, it does not work for WebHDFS due to HadoopIllegalArgumentException. > {code} > $ curl -i "http://:/webhdfs/v1/?op=OPEN" > HTTP/1.1 307 TEMPORARY_REDIRECT > (omitted) > Location: > http://:/webhdfs/v1/?op=OPEN==0 > $ curl -i > "http://:/webhdfs/v1/?op=OPEN==0" > HTTP/1.1 400 Bad Request > (omitted) > {"RemoteException":{"exception":"HadoopIllegalArgumentException","javaClassName":"org.apache.hadoop.HadoopIllegalArgumentException","message":"Invalid > argument, newValue is null"}} > {code} > This is because secure Datanode expects a delegation token, but insecure > Namenode does not return it to a client. > - org.apache.hadoop.security.token.Token.decodeWritable > {code} > private static void decodeWritable(Writable obj, > String newValue) throws IOException { > if (newValue == null) { > throw new HadoopIllegalArgumentException( > "Invalid argument, newValue is null"); > } > {code} > The issue proposes to support the access also for WebHDFS. > The attached PDF file [^webhdfs-federation.pdf] depicts our current > architecture and proposal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15423: Component/s: webhdfs > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Priority: Major > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
Chao Sun created HDFS-15423: --- Summary: RBF: WebHDFS create shouldn't choose DN from all sub-clusters Key: HDFS-15423 URL: https://issues.apache.org/jira/browse/HDFS-15423 Project: Hadoop HDFS Issue Type: Bug Components: rbf Reporter: Chao Sun In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from the list via {{getRandomDatanode}}. This logic doesn't seem correct as it should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15417) Lazy get the datanode report for federation WebHDFS operations
[ https://issues.apache.org/jira/browse/HDFS-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139554#comment-17139554 ] Chao Sun commented on HDFS-15417: - I think this addresses the same issue in HDFS-15014. Internally we were trying to use the cached DN reports but those are tied with Router metrics and the implementation is kind of messy. > Lazy get the datanode report for federation WebHDFS operations > -- > > Key: HDFS-15417 > URL: https://issues.apache.org/jira/browse/HDFS-15417 > Project: Hadoop HDFS > Issue Type: Improvement > Components: federation, rbf, webhdfs >Reporter: Ye Ni >Assignee: Ye Ni >Priority: Minor > > *Why* > For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or > namenode needs to get the datanodes where the block is located, then redirect > the request to one of the datanodes. > However, this chooseDatanode action in router is much slower than namenode, > which directly affects the WebHDFS operations above. > For namenode WebHDFS, it normally takes tens of milliseconds, while router > always takes more than 2 seconds. > *How* > Only get the datanode report when necessary in router. It is a very expense > operation where all the time is spent on. > This is only needed when we want to exclude some datanodes or find a random > datanode for CREATE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14320) Support skipTrash for WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118235#comment-17118235 ] Chao Sun edited comment on HDFS-14320 at 5/28/20, 10:16 AM: Bumping up this as this seems to be an important feature. Curious what is the current status [~kpalanisamy], [~weichiu]. was (Author: csun): Bumping up this as this seems to be an important feature. Curious what is the current status [~weichiu]. > Support skipTrash for WebHDFS > -- > > Key: HDFS-14320 > URL: https://issues.apache.org/jira/browse/HDFS-14320 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode, webhdfs >Affects Versions: 3.2.0 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Attachments: HDFS-14320-001.patch, HDFS-14320-002.patch, > HDFS-14320-003.patch, HDFS-14320-004.patch, HDFS-14320-005.patch, > HDFS-14320-006.patch, HDFS-14320-007.patch, HDFS-14320-008.patch > > > Files/Directories deleted via webhdfs rest call doesn't use the skiptrash > feature, it would be deleted permanently. This feature is very important us > because our user has deleted large directory accidentally. > By default, Skiptrash option is set to true, skiptrash=true. Any files, Using > CURL will be permanently deleted. > Example: > curl -iv -X DELETE > "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true; > > Use skiptrash=false, to move files to trash Instead. > Example: > curl -iv -X DELETE > "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true=false; > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14320) Support skipTrash for WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118235#comment-17118235 ] Chao Sun commented on HDFS-14320: - Bumping up this as this seems to be an important feature. Curious what is the current status [~weichiu]. > Support skipTrash for WebHDFS > -- > > Key: HDFS-14320 > URL: https://issues.apache.org/jira/browse/HDFS-14320 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode, webhdfs >Affects Versions: 3.2.0 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Attachments: HDFS-14320-001.patch, HDFS-14320-002.patch, > HDFS-14320-003.patch, HDFS-14320-004.patch, HDFS-14320-005.patch, > HDFS-14320-006.patch, HDFS-14320-007.patch, HDFS-14320-008.patch > > > Files/Directories deleted via webhdfs rest call doesn't use the skiptrash > feature, it would be deleted permanently. This feature is very important us > because our user has deleted large directory accidentally. > By default, Skiptrash option is set to true, skiptrash=true. Any files, Using > CURL will be permanently deleted. > Example: > curl -iv -X DELETE > "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true; > > Use skiptrash=false, to move files to trash Instead. > Example: > curl -iv -X DELETE > "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true=false; > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
[ https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15259: Resolution: Won't Fix Status: Resolved (was: Patch Available) > Reduce useless log information in FSNamesystemAuditLogger > - > > Key: HDFS-15259 > URL: https://issues.apache.org/jira/browse/HDFS-15259 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15259.001.patch, HDFS-15259.002.patch > > > For most operations, the 'dst' is null, add checking before logging the 'dst' > information in FSNamesystemAuditLogger > {code:java} > 2020-04-03 16:34:40,021 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,329 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,362 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null > perm=yang:supergroup:rwxr-xr-x proto=rpc{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
[ https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111456#comment-17111456 ] Chao Sun commented on HDFS-15259: - Yup. I think it is a won't fix. > Reduce useless log information in FSNamesystemAuditLogger > - > > Key: HDFS-15259 > URL: https://issues.apache.org/jira/browse/HDFS-15259 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15259.001.patch, HDFS-15259.002.patch > > > For most operations, the 'dst' is null, add checking before logging the 'dst' > information in FSNamesystemAuditLogger > {code:java} > 2020-04-03 16:34:40,021 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,329 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,362 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null > perm=yang:supergroup:rwxr-xr-x proto=rpc{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15335) Report top N metrics for files in get listing ops
Chao Sun created HDFS-15335: --- Summary: Report top N metrics for files in get listing ops Key: HDFS-15335 URL: https://issues.apache.org/jira/browse/HDFS-15335 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, metrics Reporter: Chao Sun Currently HDFS has {{filesInGetListingOps}} metrics which tells the total number of files in all listing ops. However, it will be useful to report the top N users who contribute most to this. This can help to identify the potential bad users and stop the abusing against NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger
[ https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074718#comment-17074718 ] Chao Sun commented on HDFS-15259: - [~hadoop_yangyun] I don't think you can do this as many applications depend on the tabular format for parsing audit log and it will break them badly. > Reduce useless log information in FSNamesystemAuditLogger > - > > Key: HDFS-15259 > URL: https://issues.apache.org/jira/browse/HDFS-15259 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15259.001.patch > > > For most operations, the 'dst' is null, add checking before logging the 'dst' > information in FSNamesystemAuditLogger > {code:java} > 2020-04-03 16:34:40,021 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,329 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null > proto=rpc > 2020-04-03 16:35:16,362 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true > ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null > perm=yang:supergroup:rwxr-xr-x proto=rpc{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15197) [SBN read] Change ObserverRetryOnActiveException log to debug
[ https://issues.apache.org/jira/browse/HDFS-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15197: Summary: [SBN read] Change ObserverRetryOnActiveException log to debug (was: Change ObserverRetryOnActiveException log to debug) > [SBN read] Change ObserverRetryOnActiveException log to debug > - > > Key: HDFS-15197 > URL: https://issues.apache.org/jira/browse/HDFS-15197 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Attachments: HDFS-15197.001.patch > > > Currently in ObserverReadProxyProvider, when a ObserverRetryOnActiveException > happens, ObserverReadProxyProvider logs a message at INFO level. This can be > a large volume of logs in some scenarios. For example, when some job tries to > access lots of files that haven't been accessed for a long time, all these > accesses may trigger atime updates, which led to > ObserverRetryOnActiveException. We should change this log to DEBUG. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15197) Change ObserverRetryOnActiveException log to debug
[ https://issues.apache.org/jira/browse/HDFS-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047081#comment-17047081 ] Chao Sun commented on HDFS-15197: - +1 > Change ObserverRetryOnActiveException log to debug > -- > > Key: HDFS-15197 > URL: https://issues.apache.org/jira/browse/HDFS-15197 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Attachments: HDFS-15197.001.patch > > > Currently in ObserverReadProxyProvider, when a ObserverRetryOnActiveException > happens, ObserverReadProxyProvider logs a message at INFO level. This can be > a large volume of logs in some scenarios. For example, when some job tries to > access lots of files that haven't been accessed for a long time, all these > accesses may trigger atime updates, which led to > ObserverRetryOnActiveException. We should change this log to DEBUG. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly
[ https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047074#comment-17047074 ] Chao Sun edited comment on HDFS-15196 at 2/28/20 12:05 AM: --- +1. Patch LGTM but will be great if [~elgoiri] or others who're familiar with RBF can take a look. was (Author: csun): Patch LGTM but will be great if [~elgoiri] or others who're familiar with RBF can take a look. > RouterRpcServer getListing cannot list large dirs correctly > --- > > Key: HDFS-15196 > URL: https://issues.apache.org/jira/browse/HDFS-15196 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Critical > Attachments: HDFS-15196.001.patch > > > In RouterRpcServer, getListing function is handled as two parts: > # Union all partial listings from destination ns + paths > # Append mount points for the dir to be listed > In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT > (with default value 1k), the batch listing will be used and the startAfter > will be used to define the boundary of each batch listing. However, step 2 > here will add existing mount points, which will mess up with the boundary of > the batch, thus making the next batch startAfter wrong. > The fix is just to append the mount points when there is no more batch query > necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly
[ https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047074#comment-17047074 ] Chao Sun commented on HDFS-15196: - Patch LGTM but will be great if [~elgoiri] or others who're familiar with RBF can take a look. > RouterRpcServer getListing cannot list large dirs correctly > --- > > Key: HDFS-15196 > URL: https://issues.apache.org/jira/browse/HDFS-15196 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Critical > Attachments: HDFS-15196.001.patch > > > In RouterRpcServer, getListing function is handled as two parts: > # Union all partial listings from destination ns + paths > # Append mount points for the dir to be listed > In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT > (with default value 1k), the batch listing will be used and the startAfter > will be used to define the boundary of each batch listing. However, step 2 > here will add existing mount points, which will mess up with the boundary of > the batch, thus making the next batch startAfter wrong. > The fix is just to append the mount points when there is no more batch query > necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly
[ https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047069#comment-17047069 ] Chao Sun commented on HDFS-15196: - Thanks Fengnan for the patch. Raising this to Critical since it is a correctness issue. > RouterRpcServer getListing cannot list large dirs correctly > --- > > Key: HDFS-15196 > URL: https://issues.apache.org/jira/browse/HDFS-15196 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Critical > Attachments: HDFS-15196.001.patch > > > In RouterRpcServer, getListing function is handled as two parts: > # Union all partial listings from destination ns + paths > # Append mount points for the dir to be listed > In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT > (with default value 1k), the batch listing will be used and the startAfter > will be used to define the boundary of each batch listing. However, step 2 > here will add existing mount points, which will mess up with the boundary of > the batch, thus making the next batch startAfter wrong. > The fix is just to append the mount points when there is no more batch query > necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly
[ https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15196: Priority: Critical (was: Major) > RouterRpcServer getListing cannot list large dirs correctly > --- > > Key: HDFS-15196 > URL: https://issues.apache.org/jira/browse/HDFS-15196 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Critical > Attachments: HDFS-15196.001.patch > > > In RouterRpcServer, getListing function is handled as two parts: > # Union all partial listings from destination ns + paths > # Append mount points for the dir to be listed > In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT > (with default value 1k), the batch listing will be used and the startAfter > will be used to define the boundary of each batch listing. However, step 2 > here will add existing mount points, which will mess up with the boundary of > the batch, thus making the next batch startAfter wrong. > The fix is just to append the mount points when there is no more batch query > necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992018#comment-16992018 ] Chao Sun commented on HDFS-15036: - [~vagarychen] sorry for grabbing this JIRA too soon :) Since you have done much study on this, do you want to take this JIRA instead? > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Status: Patch Available (was: Open) > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Attachment: HDFS-15017-branch-2.000.patch > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990201#comment-16990201 ] Chao Sun commented on HDFS-15017: - Seems like a trivial change - the import was added by HDFS-7073 > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Attachment: (was: HDFS-15017-branch-2.000.patch) > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Attachment: HDFS-15017-branch-2.000.patch > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HDFS-15036: --- Assignee: Chao Sun > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14963) Add HDFS Client machine caching active namenode index mechanism.
[ https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990087#comment-16990087 ] Chao Sun commented on HDFS-14963: - Seems this and HDFS-15024 are solving very similar problems, and the solution there could be much simpler. Should we instead pursue that approach? I also tend to echo [~shv]'s point and not sure having clients to write to local file is a good idea. > Add HDFS Client machine caching active namenode index mechanism. > > > Key: HDFS-14963 > URL: https://issues.apache.org/jira/browse/HDFS-14963 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.1.3 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Labels: multi-sbnn > > In multi-NameNodes scenery, a new hdfs client always begins a rpc call from > the 1st namenode, simply polls, and finally determines the current Active > namenode. > This brings at least two problems: > # Extra failover consumption, especially in the case of frequent creation of > clients. > # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and > then a client starts rpc with the 1st NN, it will be silent when failover > from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd > NN, it prints some unnecessary logs, in some scenarios, these logs will be > very numerous: > {code:java} > 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459) > ...{code} > We can introduce a solution for this problem: in client machine, for every > hdfs cluster, caching its current Active NameNode index in a separate cache > file named by its uri. *Note these cache files are shared by all hdfs client > processes on this machine*. > For example, suppose there are hdfs://ns1 and hdfs://ns2, and the client > machine cache file directory is /tmp, then: > # the ns1 cluster related cache file is /tmp/ns1 > # the ns2 cluster related cache file is /tmp/ns2 > And then: > # When a client starts, it reads the current Active NameNode index from the > corresponding cache file based on the target hdfs uri, and then directly make > an rpc call toward the right ANN. > # After each time client failovers, it need to write the latest Active > NameNode index to the corresponding cache file based on the target hdfs uri. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990059#comment-16990059 ] Chao Sun commented on HDFS-15024: - {quote} Chao Sun I think the msync case is just a case, maybe the current problem is a common problem for Support more than 2 NameNodes? {quote} yes you are correct. This is a more general problem for multi-sbn feature but I think we could optimize {{msync}} specifically to avoid the retry backoff. Regarding patch v1, seems it only handles the first few retries and later on when {{times}} gradually increment to passes beyond {{numNameNodes - 1 }}, it will still do exponential backoff on all the SBNs. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15005: Attachment: HDFS-15005-branch-2.003.patch > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990023#comment-16990023 ] Chao Sun commented on HDFS-15005: - Rebased to the latest branch-2. [~weichiu] pls take a look. > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14998) [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130
[ https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989047#comment-16989047 ] Chao Sun commented on HDFS-14998: - +1 on v006 as well. > [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130 > - > > Key: HDFS-14998 > URL: https://issues.apache.org/jira/browse/HDFS-14998 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, > HDFS-14998.003.patch, HDFS-14998.004.patch, HDFS-14998.005.patch, > HDFS-14998.006.patch > > > After HDFS-14130, we should update observer namenode doc, observer namenode > can run with ZKFC running -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org