[jira] [Resolved] (HDFS-16686) GetJournalEditServlet fails to authorize valid Kerberos request

2022-09-13 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-16686.
-
Fix Version/s: 3.3.9
 Hadoop Flags: Reviewed
   Resolution: Fixed

> GetJournalEditServlet fails to authorize valid Kerberos request
> ---
>
> Key: HDFS-16686
> URL: https://issues.apache.org/jira/browse/HDFS-16686
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running in Kubernetes using Java 11 in an HA 
> configuration.  JournalNodes run on separate pods and have their own Kerberos 
> principal "jn/@".
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> GetJournalEditServlet uses request.getRemoteuser() to determine the 
> remoteShortName for Kerberos authorization, which fails to match when the 
> JournalNode uses its own Kerberos principal (e.g. jn/@).
> This can be fixed by using the UserGroupInformation provided by the base 
> DfsServlet class using the getUGI(request, conf) call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-4043) Namenode Kerberos Login does not use proper hostname for host qualified hdfs principal name.

2022-08-17 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-4043.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Namenode Kerberos Login does not use proper hostname for host qualified hdfs 
> principal name.
> 
>
> Key: HDFS-4043
> URL: https://issues.apache.org/jira/browse/HDFS-4043
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha, 2.0.3-alpha, 
> 3.4.0, 3.3.9
> Environment: CDH4U1 on Ubuntu 12.04
>Reporter: Ahad Rana
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>   Original Estimate: 24h
>  Time Spent: 50m
>  Remaining Estimate: 23h 10m
>
> The Namenode uses the loginAsNameNodeUser method in NameNode.java to login 
> using the hdfs principal. This method in turn invokes SecurityUtil.login with 
> a hostname (last parameter) obtained via a call to InetAddress.getHostName. 
> This call does not always return the fully qualified host name, and thus 
> causes the namenode to login to fail due to kerberos's inability to find a 
> matching hdfs principal in the hdfs.keytab file. Instead it should use 
> InetAddress.getCanonicalHostName. This is consistent with what is used 
> internally by SecurityUtil.java to login in other services, such as the 
> DataNode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error

2022-08-11 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-16702.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> MiniDFSCluster should report cause of exception in assertion error
> --
>
> Key: HDFS-16702
> URL: https://issues.apache.org/jira/browse/HDFS-16702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
> Environment: Tests running in the Hadoop dev environment image.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When the MiniDFSClsuter detects that an exception caused an exit, it should 
> include that exception as the cause for the AssertionError that it throws.  
> The current AssertError simply reports the message "Test resulted in an 
> unexpected exit" and provides a stack trace to the location of the check for 
> an exit exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-03-31 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16507:

Fix Version/s: 3.2.4

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>     
> 

[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-03-31 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16507:

Fix Version/s: 3.3.3

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>     
> 

[jira] [Resolved] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-03-31 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-16507.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>     
> 

[jira] [Updated] (HDFS-16271) RBF: NullPointerException when setQuota through routers with quota disabled

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16271:

Fix Version/s: 3.3.2
   (was: 3.3.1)

> RBF: NullPointerException when setQuota through routers with quota disabled
> ---
>
> Key: HDFS-16271
> URL: https://issues.apache.org/jira/browse/HDFS-16271
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16271.001.patch, HDFS-16271.002.patch
>
>
> When we started routers with *dfs.federation.router.quota.enable=false*, and 
> try to setQuota through them, NullPointerException caught.
> The cuase of NPE is that the Router#quotaManager not initialized when 
> dfs.federation.router.quota.enable=false,
>  but when executing setQuota rpc request inside router, we wolud use it in 
> method Quota#isMountEntry without null check .
> I think it's better to check whether Router#isQuotaEnabled is true before use 
> Router#quotaManager, and throw an IOException with readable message if need.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16344) Improve DirectoryScanner.Stats#toString

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16344:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Improve DirectoryScanner.Stats#toString
> ---
>
> Key: HDFS-16344
> URL: https://issues.apache.org/jira/browse/HDFS-16344
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
> Attachments: image-2021-11-21-19-35-16-838.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Improve DirectoryScanner.Stats#toString.
> !image-2021-11-21-19-35-16-838.png|width=1019,height=71!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16332:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Expired block token causes slow read due to missing handling in sasl handshake
> --
>
> Key: HDFS-16332
> URL: https://issues.apache.org/jira/browse/HDFS-16332
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, dfs, dfsclient
>Affects Versions: 2.8.5, 3.3.1
>Reporter: Shinya Yoshida
>Assignee: Shinya Yoshida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from 
> 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> We're operating the HBase 1.4.x cluster on Hadoop 2.8.5.
> We're recently evaluating Kerberos secured HBase and Hadoop cluster with 
> production load and we observed HBase's response slows >= several seconds, 
> and about several minutes for worst-case (about once~three times a month).
> The following image is a scatter plot of HBase's response slow, each circle 
> is each base's slow response log.
> The X-axis is the date time of the log occurred, the Y-axis is the response 
> slow time.
>  !Screenshot from 2021-11-18 12-14-29.png! 
> We could reproduce this issue by reducing "dfs.block.access.token.lifetime" 
> and we could figure out the cause.
> (We used dfs.block.access.token.lifetime=60, i.e. 1 hour)
> When hedged read enabled:
>  !Screenshot from 2021-11-18 12-11-34.png! 
> When hedged read disabled:
>  !Screenshot from 2021-11-18 13-31-35.png! 
> As you can see, it's worst if the hedged read is enabled. However, it happens 
> whether the hedged read is enabled or not.
> This impacts our 99%tile response time.
> This happens when the block token is expired and the root cause is the wrong 
> handling of the InvalidToken exception in sasl handshake in 
> SaslDataTransferServer.
> I propose to add a new response code for DataTransferEncryptorStatus to 
> request the client to update the block token like DataTransferProtos does.
> The test code and patch is available in 
> https://github.com/apache/hadoop/pull/3677
> We could reproduce this issue by the following test code in 2.8.5 branch and 
> trunk as I tested
> {code:java}
> // HDFS is configured as secure cluster
> try (FileSystem fs = newFileSystem();
>  FSDataInputStream in = fs.open(PATH)) {
> waitBlockTokenExpired(in);
> in.read(0, bytes, 0, bytes.length)
> }
> private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception {
> DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream();
> for (LocatedBlock block : innerStream.getAllBlocks()) {
> while (!SecurityTestUtil.isBlockTokenExpired(block.getBlockToken())) {
> Thread.sleep(100);
> }
> }
> }
> {code}
> Here is the log we got, we added a custom log before and after the block 
> token refresh:
> https://github.com/bitterfox/hadoop/commit/173a9f876f2264b76af01d658f624197936fd79c
> {code}
> 2021-11-16 09:40:20,330 WARN  [hedgedRead-247] impl.BlockReaderFactory: I/O 
> error constructing remote block reader.
> java.io.IOException: DIGEST-MD5: IO error acquiring password
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:420)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:475)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:389)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at 
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:568)
> at 
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2880)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:815)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:740)
> at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385)
> at 
> 

[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16350:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Datanode start time should be set after RPC server starts successfully
> --
>
> Key: HDFS-16350
> URL: https://issues.apache.org/jira/browse/HDFS-16350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We set start time of Datanode when the class is instantiated but it should be 
> ideally set only after RPC server starts and RPC handlers are initialized to 
> serve client requests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16336) De-flake TestRollingUpgrade#testRollback

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16336:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> De-flake TestRollingUpgrade#testRollback
> 
>
> Key: HDFS-16336
> URL: https://issues.apache.org/jira/browse/HDFS-16336
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 3.4.0
>Reporter: Kevin Wikant
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This pull request: [https://github.com/apache/hadoop/pull/3675]
> Failed Jenkins pre-commit job due to an unrelated unit test failure: 
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3675/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
> {code:java}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade)
> [ERROR]   Run 1: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 
> expected null, but 
> was:  createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})>
> [ERROR]   Run 2: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 
> expected null, but 
> was:  createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})>
> [ERROR]   Run 3: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 
> expected null, but 
> was:  createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> {code}
> Seems that perhaps "TestRollingUpgrade.testRollback" is a flaky unit test



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16171) De-flake testDecommissionStatus

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16171:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16339) Show the threshold when mover threads quota is exceeded

2022-02-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16339:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Show the threshold when mover threads quota is exceeded
> ---
>
> Key: HDFS-16339
> URL: https://issues.apache.org/jira/browse/HDFS-16339
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-20-17-23-04-924.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show the threshold when mover threads quota is exceeded in 
> DataXceiver#replaceBlock and DataXceiver#copyBlock.
> !image-2021-11-20-17-23-04-924.png|width=1233,height=124!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks

2022-01-14 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476281#comment-17476281
 ] 

Chao Sun commented on HDFS-16420:
-

[~tasanuma] done

> Avoid deleting unique data blocks when deleting redundancy striped blocks
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liubingxing
>Assignee: Jackson Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks

2022-01-14 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16420:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Avoid deleting unique data blocks when deleting redundancy striped blocks
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liubingxing
>Assignee: Jackson Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16410) Insecure Xml parsing in OfflineEditsXmlLoader

2022-01-05 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-16410.
-
Fix Version/s: 3.4.0
   3.3.2
   Resolution: Fixed

> Insecure Xml parsing in OfflineEditsXmlLoader 
> --
>
> Key: HDFS-16410
> URL: https://issues.apache.org/jira/browse/HDFS-16410
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available, security
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Insecure Xml parsing in OfflineEditsXmlLoader 
> [https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/OfflineEditsXmlLoader.java#L88]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero

2022-01-05 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16408:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Ensure LeaseRecheckIntervalMs is greater than zero
> --
>
> Key: HDFS-16408
> URL: https://issues.apache.org/jira/browse/HDFS-16408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3, 3.3.1
>Reporter: Jingxuan Fu
>Assignee: Jingxuan Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>   Original Estimate: 1h
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a problem with the try catch statement in the LeaseMonitor daemon 
> (in LeaseManager.java), when an unknown exception is caught, it simply prints 
> a warning message and continues with the next loop. 
> An extreme case is when the configuration item 
> 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative 
> number by the user, as the configuration item is read without checking its 
> range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is 
> used as an argument to Thread.sleep(). A negative argument will cause 
> Thread.sleep() to throw an IllegalArgumentException, which will be caught by 
> 'catch(Throwable e)' and a warning message will be printed. 
> This behavior is repeated for each subsequent loop. This means that a huge 
> amount of repetitive messages will be printed to the log file in a short 
> period of time, quickly consuming disk space and affecting the operation of 
> the system.
> As you can see, 178M log files are generated in one minute.
>  
> {code:java}
> ll logs/
> total 174456
> drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
> drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
> -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 
> hadoop-hadoop-datanode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 
> hadoop-hadoop-datanode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 
> hadoop-hadoop-namenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 
> hadoop-hadoop-namenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
>  
> tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
> 2022-01-03 15:14:46,032 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> I think there are two potential solutions. 
> The first is to adjust the position of the try catch statement in the 
> LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop 
> body. This can be done like the NameNodeResourceMonitor daemon, which ends 
> the thread when an unexpected exception is caught. 
> The second is to use Precondition.checkArgument() to scope the configuration 
> item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the 
> wrong configuration item can affect the subsequent operation of the program.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16314) Support to make dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16314:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Support to make 
> dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
> -
>
> Key: HDFS-16314
> URL: https://issues.apache.org/jira/browse/HDFS-16314
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Consider that make 
> dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable 
> and rapid rollback in case this feature HDFS-16076 unexpected things happen 
> in production environment



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16287) Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16287:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Support to make dfs.namenode.avoid.read.slow.datanode  reconfigurable
> -
>
> Key: HDFS-16287
> URL: https://issues.apache.org/jira/browse/HDFS-16287
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> 1. Consider that make dfs.namenode.avoid.read.slow.datanode  reconfigurable 
> and rapid rollback in case this feature 
> [HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076] unexpected 
> things happen in production environment  
> 2.  DatanodeManager#startSlowPeerCollector by parameter 
> 'dfs.datanode.peer.stats.enabled' to control



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16268) Balancer stuck when moving striped blocks due to NPE

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16268:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Balancer stuck when moving striped blocks due to NPE
> 
>
> Key: HDFS-16268
> URL: https://issues.apache.org/jira/browse/HDFS-16268
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, erasure-coding
>Affects Versions: 3.2.2
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {code:java}
> 21/10/11 06:11:26 WARN balancer.Dispatcher: Dispatcher thread failed
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.markMovedIfGoodBlock(Dispatcher.java:289)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.chooseBlockAndProxy(Dispatcher.java:272)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:236)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.chooseNextMove(Dispatcher.java:899)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.dispatchBlocks(Dispatcher.java:958)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.access$3300(Dispatcher.java:757)
>         at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$2.run(Dispatcher.java:1226)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> Due to NPE in the middle, there will be pending moves left in the queue so 
> balancer will stuck forever.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16293:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Client sleeps and holds 'dataQueue' when DataNodes are congested
> 
>
> Key: HDFS-16293
> URL: https://issues.apache.org/jira/browse/HDFS-16293
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.2.2, 3.3.1, 3.2.3
>Reporter: Yuanxin Zhu
>Assignee: Yuanxin Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, 
> HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, 
> HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for 
> testing, DataNodes are congested(HDFS-8008). The client enters the sleep 
> state after receiving the ACK for many times, but does not release the 
> 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute 
> 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to 
> release the 'dataQueue', which is equivalent to that the ResponseProcessor 
> thread also enters sleep, resulting in ACK delay.MapReduce tasks can be 
> delayed by tens of minutes or even hours.
> The DataStreamer thread can first execute 'one = dataQueue. getFirst()', 
> release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' 
> according to 'one.isHeartbeatPacket()'
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15862) Make TestViewfsWithNfs3.testNfsRenameSingleNN() idempotent

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15862:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Make TestViewfsWithNfs3.testNfsRenameSingleNN() idempotent
> --
>
> Key: HDFS-15862
> URL: https://issues.apache.org/jira/browse/HDFS-15862
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: nfs
>Reporter: Zhengxi Li
>Assignee: Zhengxi Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HDFS-15862.001.patch, HDFS-15862.002.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The 
> 'org.apache.hadoop.hdfs.nfs.nfs3.TestViewfsWithNfs3.testNfsRenameSingleNN' 
> test is not idempotent and fails if run twice in the same JVM, because it 
> pollutes state shared among tests. It may be good to clean this state 
> pollution so that some other tests do not fail in the future due to the 
> shared state polluted by this test.
> Running {{TestViewfsWithNfs3.testNfsRenameSingleNN}} twice would result in 
> the second run failing with a NullPointer exception:
> {noformat}
> [ERROR] Errors:
> [ERROR]   TestViewfsWithNfs3.testNfsRenameSingleNN:317 NullPointer
> {noformat}
> The reason for this is that the {{/user1/renameSingleNN}} file is created in 
> {{setup()}}, but gets renamed in {{testNfsRenameSingl{{eNN. When the 
> second run of {{testNfsRenameSingleNN}} tries to get info of the file by its 
> original name, it returns a NullPointer since the file no longer exists.
>  
> Link to PR: https://github.com/apache/hadoop/pull/2724



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16345) Fix test cases fail in TestBlockStoragePolicy

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16345:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Fix test cases fail in TestBlockStoragePolicy
> -
>
> Key: HDFS-16345
> URL: https://issues.apache.org/jira/browse/HDFS-16345
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> test class ``TestBlockStoragePolicy` ` fail frequently for the 
> `BindException`, it block all normal source code build. we can improve it.
> [ERROR] Tests run: 26, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 49.295 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestBlockStoragePolicy 
> [ERROR] 
> testChooseTargetWithTopology(org.apache.hadoop.hdfs.TestBlockStoragePolicy) 
> Time elapsed: 0.551 s <<< ERROR! java.net.BindException: Problem binding to 
> [localhost:43947] java.net.BindException: Address already in use; For more 
> details see: http://wiki.apache.org/hadoop/BindException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:827) at 
> org.apache.hadoop.ipc.Server.bind(Server.java:657) at 
> org.apache.hadoop.ipc.Server$Listener.(Server.java:1352) at 
> org.apache.hadoop.ipc.Server.(Server.java:3252) at 
> org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:468)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371)
>  at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853) at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:466)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:860)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:766) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1017) 
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:992) 
> at 
> org.apache.hadoop.hdfs.TestBlockStoragePolicy.testChooseTargetWithTopology(TestBlockStoragePolicy.java:1275)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>  at 
> 

[jira] [Updated] (HDFS-16333) fix balancer bug when transfer an EC block

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16333:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> fix balancer bug when transfer an EC block
> --
>
> Key: HDFS-16333
> URL: https://issues.apache.org/jira/browse/HDFS-16333
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-18-17-25-13-089.png, 
> image-2021-11-18-17-25-50-556.png, image-2021-11-18-17-28-03-155.png
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> We set the EC policy to (6+3) and we also have nodes that were 
> decommissioning when we executed balancer.
> With the balancer running, we find many error logs as follow.
> !image-2021-11-18-17-25-13-089.png|width=858,height=135!
> Node A wants to transfer an EC block to node B, but we found that the block 
> is not on node A. The FSCK command to show the block status as follow
> !image-2021-11-18-17-25-50-556.png|width=607,height=189!
> In the dispatcher. getBlockList function
> !image-2021-11-18-17-28-03-155.png!
>  
> Assume that the location of the an EC block in storageGroupMap look like this
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, b, c, d, e, f, g, h, i]
> after decommission operation, the internal block on indices[1] were 
> decommission to another node.
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> node:[a, {color:#FF}j{color}, c, d, e, f, g, h, i]
> the location of indices[1] change from node {color:#FF}b{color} to node 
> {color:#FF}j{color}.
>  
> When the balancer get the block location and check it with the location in 
> storageGroupMap.
> If a node is not found in storageGroupMap, it will not be add to block 
> locations.
> In this case, node {color:#FF}j {color}will not be added to the block 
> locations, while the indices is not updated.
> Finally, the block location may look like this, 
> indices:[0, 1, 2, 3, 4, 5, 6, 7, 8]
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}
> the location of the nodes does not match their indices
>  
> Solution:
> we should update the indices and match with the nodes
> {color:#FF}indices:[0, 2, 3, 4, 5, 6, 7, 8]{color}
> {color:#FF}block.location:[a, c, d, e, f, g, h, i]{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16375) The FBR lease ID should be exposed to the log

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16375:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> The FBR lease ID should be exposed to the log
> -
>
> Key: HDFS-16375
> URL: https://issues.apache.org/jira/browse/HDFS-16375
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Our Hadoop version is 3.1.0. We encountered HDFS-12914 and HDFS-14314 in the 
> production environment.
> When locating the problem, the *fullBrLeaseId* was not exposed in the log, 
> which caused some difficulties. We should expose it to the log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16373) Fix MiniDFSCluster restart in case of multiple namenodes

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16373:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Fix MiniDFSCluster restart in case of multiple namenodes
> 
>
> Key: HDFS-16373
> URL: https://issues.apache.org/jira/browse/HDFS-16373
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In case of multiple namenodes, if more than one namenode are restarted, it 
> fails. Since the restartNamenode checks for all the namenodes to get up, But 
> if 2 namenodes are down, and we restart one, the other namenode won't be up, 
> so restart fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16327) Make dfs.namenode.max.slowpeer.collect.nodes reconfigurable

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16327:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Make dfs.namenode.max.slowpeer.collect.nodes reconfigurable
> ---
>
> Key: HDFS-16327
> URL: https://issues.apache.org/jira/browse/HDFS-16327
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As the HDFS cluster expands or shrinks, the number of slow nodes to be 
> filtered must be dynamically adjusted. So we should make 
> DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY reconfigurable.
> See HDFS-15879.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16377) Should CheckNotNull before access FsDatasetSpi

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16377:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Should CheckNotNull before access FsDatasetSpi
> --
>
> Key: HDFS-16377
> URL: https://issues.apache.org/jira/browse/HDFS-16377
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-12-10-19-19-22-957.png, 
> image-2021-12-10-19-20-58-022.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When starting the DN, we found NPE in the staring DN's log, as follows:
> !image-2021-12-10-19-19-22-957.png|width=909,height=126!
> The logs of the upstream DN are as follows:
> !image-2021-12-10-19-20-58-022.png|width=905,height=239!
> This is mainly because *FsDatasetSpi* has not been initialized at the time of 
> access. 
> I noticed that checkNotNull is already done in these two 
> method({*}DataNode#getBlockLocalPathInfo{*} and 
> {*}DataNode#getVolumeInfo{*}). So we should add it to other places(interfaces 
> that clients and other DN can access directly) so that we can add a message 
> when throwing exceptions.
> Therefore, the client and the upstream DN know that FsDatasetSpi has not been 
> initialized, rather than blindly unaware of the specific cause of the NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16391:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16386:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2, 3.2.4
>
> Attachments: monitor.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16395) Remove useless NNThroughputBenchmark#dummyActionNoSynch()

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16395:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Remove useless NNThroughputBenchmark#dummyActionNoSynch()
> -
>
> Key: HDFS-16395
> URL: https://issues.apache.org/jira/browse/HDFS-16395
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks, namenode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Doesn't seem to be used anywhere NNThroughputBenchmark#dummyActionNoSynch(), 
> It is recommended to delete it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-14099:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Unknown frame descriptor when decompressing multiple frames in 
> ZStandardDecompressor
> 
>
> Key: HDFS-14099
> URL: https://issues.apache.org/jira/browse/HDFS-14099
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Hadoop Version: hadoop-3.0.3
> Java Version: 1.8.0_144
>Reporter: xuzq
>Assignee: xuzq
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, 
> HDFS-14099-trunk-003.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We need to use the ZSTD compression algorithm in Hadoop. So I write a simple 
> demo like this for testing.
> {code:java}
> // code placeholder
> while ((size = fsDataInputStream.read(bufferV2)) > 0 ) {
>   countSize += size;
>   if (countSize == 65536 * 8) {
> if(!isFinished) {
>   // finish a frame in zstd
>   cmpOut.finish();
>   isFinished = true;
> }
> fsDataOutputStream.flush();
> fsDataOutputStream.hflush();
>   }
>   if(isFinished) {
> LOG.info("Will resetState. N=" + n);
> // reset the stream and write again
> cmpOut.resetState();
> isFinished = false;
>   }
>   cmpOut.write(bufferV2, 0, size);
>   bufferV2 = new byte[5 * 1024 * 1024];
>   n++;
> }
> {code}
>  
> And I use "*hadoop fs -text*"  to read this file and failed. The error as 
> blow.
> {code:java}
> Exception in thread "main" java.lang.InternalError: Unknown frame descriptor
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native
>  Method)
> at 
> org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
> at java.io.InputStream.read(InputStream.java:101)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
> at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
> at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
> at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
> {code}
>  
> So I had to look the code, include jni, then found this bug.
> *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*.
> The first is  in *ZStandardDecompressor.c.* 
> {code:java}
> if (size == 0) {
> (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, 
> JNI_TRUE);
> size_t result = dlsym_ZSTD_initDStream(stream);
> if (dlsym_ZSTD_isError(result)) {
> THROW(env, "java/lang/InternalError", 
> dlsym_ZSTD_getErrorName(result));
> return (jint) 0;
> }
> }
> {code}
> This call here is correct, but *Finished* no longer be set to false, even if 
> there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need 
> to be decompressed.
> The second is in *org.apache.hadoop.io.compress.DecompressorStream* by 
> *decompressor.reset()*, because *Finished* is always true after decompressed 
> a *Frame*.
> {code:java}
> if (decompressor.finished()) {
>   // First see if there was any leftover buffered input from previous
>   // stream; if not, attempt to refill buffer.  If refill -> EOF, we're
>   // all done; else reset, fix up input buffer, and get ready for next
>   // concatenated substream/"member".
>   int nRemaining = decompressor.getRemaining();
>   if (nRemaining == 0) {
> int m = getCompressedData();
> if (m == -1) {
>   // apparently the previous end-of-stream was also end-of-file:
>   // return success, as if we had never called getCompressedData()
>   eof = true;
>   return -1;
> }
> decompressor.reset();
> decompressor.setInput(buffer, 0, m);
> 

[jira] [Updated] (HDFS-16409) Fix typo: testHasExeceptionsReturnsCorrectValue -> testHasExceptionsReturnsCorrectValue

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16409:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Fix typo: testHasExeceptionsReturnsCorrectValue -> 
> testHasExceptionsReturnsCorrectValue
> ---
>
> Key: HDFS-16409
> URL: https://issues.apache.org/jira/browse/HDFS-16409
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Fixing typo testHasExeceptionsReturnsCorrectValue to 
> testHasExceptionsReturnsCorrectValue in 
> {code:java}
> hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestAddBlockPoolException.java{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16343) Add some debug logs when the dfsUsed are not used during Datanode startup

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16343:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Add some debug logs when the dfsUsed are not used during Datanode startup
> -
>
> Key: HDFS-16343
> URL: https://issues.apache.org/jira/browse/HDFS-16343
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16337) Show start time of Datanode on Web

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16337:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Show start time of Datanode on Web
> --
>
> Key: HDFS-16337
> URL: https://issues.apache.org/jira/browse/HDFS-16337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: image-2021-11-19-08-55-58-343.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Show _start time_ of Datanode on Web.
> !image-2021-11-19-08-55-58-343.png|width=540,height=155!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16335) Fix HDFSCommands.md

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16335:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Fix HDFSCommands.md
> ---
>
> Key: HDFS-16335
> URL: https://issues.apache.org/jira/browse/HDFS-16335
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Fix HDFSCommands.md.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16334) Correct NameNode ACL description

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16334:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Correct NameNode ACL description
> 
>
> Key: HDFS-16334
> URL: https://issues.apache.org/jira/browse/HDFS-16334
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> `dfs.namenode.acls.enabled` is set to be `true` by default after HDFS-13505 
> ,we can improve the desc



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16328) Correct disk balancer param desc

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16328:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Correct disk balancer param desc
> 
>
> Key: HDFS-16328
> URL: https://issues.apache.org/jira/browse/HDFS-16328
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, hdfs
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> `dfs.disk.balancer.enabled` is enabled by default after HDFS-13153, we can 
> improve the doc to avoid confusion



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16330:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Fix incorrect placeholder for Exception logs in DiskBalancer
> 
>
> Key: HDFS-16330
> URL: https://issues.apache.org/jira/browse/HDFS-16330
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16326) Simplify the code for DiskBalancer

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16326:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Simplify the code for DiskBalancer
> --
>
> Key: HDFS-16326
> URL: https://issues.apache.org/jira/browse/HDFS-16326
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Simplify the code for DiskBalancer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16329) Fix log format for BlockManager

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16329:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Fix log format for BlockManager
> ---
>
> Key: HDFS-16329
> URL: https://issues.apache.org/jira/browse/HDFS-16329
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Fix log format for BlockManager.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16323:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2022-01-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-16315:

Fix Version/s: 3.3.2
   (was: 3.3.3)

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1

2021-11-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15964:

Target Version/s: 3.4.0, 3.3.3  (was: 3.4.0, 3.3.2)

> Please update the okhttp version to 4.9.1
> -
>
> Key: HDFS-15964
> URL: https://issues.apache.org/jira/browse/HDFS-15964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, dfsclient, security
>Affects Versions: 3.3.0
>Reporter: helen huang
>Priority: Major
>
> Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan 
> flagged two issues with this version. Please update it to the latest (It is 
> okhttp3 4.9.1 at this point). Thanks!
> 
>  com.squareup.okhttp3
>  okhttp
>  4.9.1
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15982) Deleted data using HTTP API should be saved to the trash

2021-11-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15982:

Target Version/s: 3.4.0, 3.3.3  (was: 3.4.0, 3.3.2)

> Deleted data using HTTP API should be saved to the trash
> 
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client, httpfs, webhdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2021-04-23 at 4.19.42 PM.png, Screenshot 
> 2021-04-23 at 4.36.57 PM.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
> This can be helpful when the user accidentally deletes data from the Web UI.
> Similarly we should provide "Skip Trash" option in HTTP API as well which 
> should be accessible through Web UI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

2021-11-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15715:

Target Version/s: 3.3.3  (was: 3.3.2)

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --
>
> Key: HDFS-15715
> URL: https://issues.apache.org/jira/browse/HDFS-15715
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.3, 3.2.1
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
> Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum, image-2021-03-26-12-17-45-500.png
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15505) Fix NullPointerException when call getAdditionalDatanode method with null extendedBlock parameter

2021-11-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15505:

Target Version/s: 3.4.0, 3.3.3  (was: 3.4.0, 3.3.2)

> Fix NullPointerException when call getAdditionalDatanode method with null 
> extendedBlock parameter
> -
>
> Key: HDFS-15505
> URL: https://issues.apache.org/jira/browse/HDFS-15505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.0.0, 3.1.0, 3.0.1, 3.0.2, 3.2.0, 3.1.1, 3.0.3, 3.1.2, 
> 3.3.0, 3.2.1, 3.1.3
>Reporter: hang chen
>Priority: Major
>
> When client call getAdditionalDatanode method, it will initialize 
> GetAdditionalDatanodeRequestProto and send RPC request to Router/namenode. 
> However, if we call getAdditionalDatanode method with null extendedBlock 
> parameter, it will set GetAdditionalDatanodeRequestProto's blk field with 
> null, which will cause NullPointerException. The code show as follow.
> {code:java}
> // code placeholder
> GetAdditionalDatanodeRequestProto req = GetAdditionalDatanodeRequestProto
>  .newBuilder()
>  .setSrc(src)
>  .setFileId(fileId)
>  .setBlk(PBHelperClient.convert(blk))
>  .addAllExistings(PBHelperClient.convert(existings))
>  .addAllExistingStorageUuids(Arrays.asList(existingStorageIDs))
>  .addAllExcludes(PBHelperClient.convert(excludes))
>  .setNumAdditionalNodes(numAdditionalNodes)
>  .setClientName(clientName)
>  .build();{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table

2021-11-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15289:

Target Version/s: 3.4.0, 3.2.4, 3.3.3  (was: 3.4.0, 3.3.2, 3.2.4)

> Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
> -
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. This approach is solving the scalability 
> problems, but users need to reconfigure the filesystem to ViewFS and to its 
> scheme.  This will be problematic in the case of paths persisted in meta 
> stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, 
> changing the file system scheme will create a burden to upgrade/recreate meta 
> stores. In our experience many users are not ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15965) Please upgrade the log4j dependency to log4j2

2021-11-08 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15965:

Target Version/s: 3.4.0, 3.3.3  (was: 3.4.0, 3.3.2)

> Please upgrade the log4j dependency to log4j2
> -
>
> Key: HDFS-15965
> URL: https://issues.apache.org/jira/browse/HDFS-15965
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
>
> The log4j dependency being use by hadoop-common is currently version 1.2.17. 
> Our fortify scan picked up a couple of issue with this dependency. Please 
> update it to the latest version of log4j2 dependencies:
> 
>  org.apache.logging.log4j
>  log4j-api
>  2.14.1
> 
> 
>  org.apache.logging.log4j
>  log4j-core
>  2.14.1
> 
>  
> The slf4j dependency will need to be updated as well after you upgrade log4j 
> to log4j2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2021-06-18 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-13522:
---

Assignee: (was: Chao Sun)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15754) Create packet metrics for DataNode

2021-01-07 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-15754.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Create packet metrics for DataNode
> --
>
> Key: HDFS-15754
> URL: https://issues.apache.org/jira/browse/HDFS-15754
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In BlockReceiver, right now when there is slowness in writeToMirror, 
> writeToDisk and writeToOsCache, it is dumped in the debug log. In practice we 
> have found these are quite useful signal to detect issues in DataNode, so it 
> will be great these metrics can be exposed by JMX.
> Also we introduced totalPacket received to use a percentage as a signal to 
> detect the potentially underperforming datanode since datanodes across one 
> HDFS cluster may received different numbers of packets totally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md

2021-01-02 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257429#comment-17257429
 ] 

Chao Sun commented on HDFS-15751:
-

+1 on patch v3. Thanks again [~hexiaoqiao] and [~shv]!

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch, 
> HDFS-15751-03.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md

2021-01-02 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257335#comment-17257335
 ] 

Chao Sun commented on HDFS-15751:
-

Thanks [~hexiaoqiao]! IMO the preconditions section should be specific to what 
conditions should be met prior to calling the method, and we should move the 
sentence:
{quote}
It is currently only implemented for HDFS and others will just throw 
UnsupportedOperationException.
{quote}
to the previous section before preconditions (also following other methods such 
as concat).

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15751-01.patch, HDFS-15751-02.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15756) RBF: Cannot get updated delegation token from zookeeper

2020-12-31 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257097#comment-17257097
 ] 

Chao Sun commented on HDFS-15756:
-

cc [~fengnanli]

> RBF: Cannot get updated delegation token from zookeeper
> ---
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15756) [RBF] Cannot get updated delegation token from zookeeper

2020-12-30 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15756:

Summary: [RBF] Cannot get updated delegation token from zookeeper  (was: 
[RBF]Cannot get updated delegation token from zookeeper)

> [RBF] Cannot get updated delegation token from zookeeper
> 
>
> Key: HDFS-15756
> URL: https://issues.apache.org/jira/browse/HDFS-15756
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.0.0
>Reporter: hbprotoss
>Priority: Major
>
> Affected version: all version with rbf
> When RBF work with spark 2.4 client mode, there will be a chance that token 
> is missing across different nodes in RBF cluster. The root cause is that 
> spark renew the  token(via resource manager) immediately after got one, as 
> zookeeper don't have a strong consistency guarantee after an update in 
> cluster, zookeeper client may read a stale value in some followers not synced 
> with other nodes.
>  
> We apply a patch in spark, but it is still the problem of RBF. Is it possible 
> for RBF to replace the delegation token store using some other 
> datasource(redis for example)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15751) Add documentation for msync() API to filesystem.md

2020-12-30 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256656#comment-17256656
 ] 

Chao Sun commented on HDFS-15751:
-

I agree with [~hexiaoqiao] and think we can mention that this is currently only 
implemented for HDFS and others will just throw 
{{UnsupportedOperationException}}, similar to what we're doing for {{concat}}, 
{{truncate}} etc. 

Otherwise I'm +1 on this. Thanks.

> Add documentation for msync() API to filesystem.md
> --
>
> Key: HDFS-15751
> URL: https://issues.apache.org/jira/browse/HDFS-15751
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15751-01.patch
>
>
> HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to 
> the API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency

2020-12-29 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15690:

Fix Version/s: 3.3.1

> Add lz4-java as hadoop-hdfs test dependency
> ---
>
> Key: HDFS-15690
> URL: https://issues.apache.org/jira/browse/HDFS-15690
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: 
> net/jpountz/lz4/LZ4Factory":
> https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/
> We need to add lz4-java to hadoop-hdfs test dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15708) TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and branch-3.2

2020-12-04 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-15708.
-
Fix Version/s: 3.3.1
   3.2.2
 Hadoop Flags: Reviewed
   Resolution: Fixed

> TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and 
> branch-3.2
> ---
>
> Key: HDFS-15708
> URL: https://issues.apache.org/jira/browse/HDFS-15708
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Chao Sun
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.2.2, 3.3.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TestURLConnectionFactory#testSSLFactoryCleanup fails:
> {noformat}
> [ERROR] 
> testSSLFactoryCleanup(org.apache.hadoop.hdfs.web.TestURLConnectionFactory)  
> Time elapsed: 0.28 s  <<< ERROR!
> java.lang.NoClassDefFoundError: 
> org/bouncycastle/x509/X509V1CertificateGenerator
> at 
> org.apache.hadoop.security.ssl.KeyStoreTestUtil.generateCertificate(KeyStoreTestUtil.java:86)
> at 
> org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:273)
> at 
> org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:228)
> at 
> org.apache.hadoop.hdfs.web.TestURLConnectionFactory.testSSLFactoryCleanup(TestURLConnectionFactory.java:83)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.lang.ClassNotFoundException: 
> org.bouncycastle.x509.X509V1CertificateGenerator
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 29 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15708) TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and branch-3.2

2020-12-03 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-15708:
---

Assignee: Chao Sun

> TestURLConnectionFactory fails by NoClassDefFoundError in branch-3.3 and 
> branch-3.2
> ---
>
> Key: HDFS-15708
> URL: https://issues.apache.org/jira/browse/HDFS-15708
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TestURLConnectionFactory#testSSLFactoryCleanup fails:
> {noformat}
> [ERROR] 
> testSSLFactoryCleanup(org.apache.hadoop.hdfs.web.TestURLConnectionFactory)  
> Time elapsed: 0.28 s  <<< ERROR!
> java.lang.NoClassDefFoundError: 
> org/bouncycastle/x509/X509V1CertificateGenerator
> at 
> org.apache.hadoop.security.ssl.KeyStoreTestUtil.generateCertificate(KeyStoreTestUtil.java:86)
> at 
> org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:273)
> at 
> org.apache.hadoop.security.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:228)
> at 
> org.apache.hadoop.hdfs.web.TestURLConnectionFactory.testSSLFactoryCleanup(TestURLConnectionFactory.java:83)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.lang.ClassNotFoundException: 
> org.bouncycastle.x509.X509V1CertificateGenerator
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> ... 29 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency

2020-11-21 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236836#comment-17236836
 ] 

Chao Sun commented on HDFS-15690:
-

[~ayushtkn], [~ste...@apache.org] can you help to add [~viirya] to HDFS 
contributor list? I can't assign this JIRA to him. Thanks.

> Add lz4-java as hadoop-hdfs test dependency
> ---
>
> Key: HDFS-15690
> URL: https://issues.apache.org/jira/browse/HDFS-15690
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: 
> net/jpountz/lz4/LZ4Factory":
> https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/
> We need to add lz4-java to hadoop-hdfs test dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency

2020-11-21 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-15690.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to trunk. I'll backport this to 3.3 branch together with HADOOP-17292 
later.

> Add lz4-java as hadoop-hdfs test dependency
> ---
>
> Key: HDFS-15690
> URL: https://issues.apache.org/jira/browse/HDFS-15690
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: 
> net/jpountz/lz4/LZ4Factory":
> https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/
> We need to add lz4-java to hadoop-hdfs test dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy

2020-11-13 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231721#comment-17231721
 ] 

Chao Sun commented on HDFS-15467:
-

[~aihuaxu] Yes. ObserverReadProxyProvider does have its own retry logic but 
only for contacting observer namenodes. For contacting active NN such as 
{{msync}} or write requests, it still rely on the upper-level retry logic.

> ObserverReadProxyProvider should skip logging first failover from each proxy
> 
>
> Key: HDFS-15467
> URL: https://issues.apache.org/jira/browse/HDFS-15467
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Hanisha Koneru
>Assignee: Aihua Xu
>Priority: Major
>
> After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first 
> failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses 
> {{combinedProxy}} object which combines all proxies into one and assigns 
> {{combinedInfo}} as the ProxyInfo.
> {noformat}
> ObserverReadProxyProvider# Lines 197-207:
> for (int i = 0; i < nameNodeProxies.size(); i++) {
>   if (i > 0) {
> combinedInfo.append(",");
>   }
>   combinedInfo.append(nameNodeProxies.get(i).proxyInfo);
> }
> combinedInfo.append(']');
> T wrappedProxy = (T) Proxy.newProxyInstance(
> ObserverReadInvocationHandler.class.getClassLoader(),
> new Class[] {xface}, new ObserverReadInvocationHandler());
> combinedProxy = new ProxyInfo<>(wrappedProxy, 
> combinedInfo.toString()){noformat}
> {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate 
> between proxies while checking if failover from that proxy happened before. 
> And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on 
> {{ObserverReadProxyProvider.}}It would need to handled separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15469) Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE

2020-11-10 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15469:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE
> 
>
> Key: HDFS-15469
> URL: https://issues.apache.org/jira/browse/HDFS-15469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.3
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15469.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the value of PacketReceiver#MAX_PACKET_SIZE is fixed and the size is 16M. 
> This value should be configurable to facilitate better performance in 
> different environments. For example, when the network environment is poor, or 
> the machine quality is not good, and the hard disk quality is not good, this 
> value should be set below 16M, such as 8M, which will be more conducive to 
> the stability of the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15469) Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE

2020-11-10 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15469:

Fix Version/s: 3.4.0

> Dynamically configure the size of PacketReceiver#MAX_PACKET_SIZE
> 
>
> Key: HDFS-15469
> URL: https://issues.apache.org/jira/browse/HDFS-15469
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.3
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15469.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Now the value of PacketReceiver#MAX_PACKET_SIZE is fixed and the size is 16M. 
> This value should be configurable to facilitate better performance in 
> different environments. For example, when the network environment is poor, or 
> the machine quality is not good, and the hard disk quality is not good, this 
> value should be set below 16M, such as 8M, which will be more conducive to 
> the stability of the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15467) ObserverReadProxyProvider should skip logging first failover from each proxy

2020-11-10 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229596#comment-17229596
 ] 

Chao Sun commented on HDFS-15467:
-

[~aihuaxu] yes {{msync}} relies on the upper-level retry logic. It won't fail 
though - instead I think it will retry using the defined retry policies. What 
issue you are seeing with this?

> ObserverReadProxyProvider should skip logging first failover from each proxy
> 
>
> Key: HDFS-15467
> URL: https://issues.apache.org/jira/browse/HDFS-15467
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Hanisha Koneru
>Assignee: Aihua Xu
>Priority: Major
>
> After HADOOP-17116, \{{RetryInvocationHandler}} skips logging the first 
> failover INFO message from each proxy. But {{ObserverReadProxyProvider}} uses 
> {{combinedProxy}} object which combines all proxies into one and assigns 
> {{combinedInfo}} as the ProxyInfo.
> {noformat}
> ObserverReadProxyProvider# Lines 197-207:
> for (int i = 0; i < nameNodeProxies.size(); i++) {
>   if (i > 0) {
> combinedInfo.append(",");
>   }
>   combinedInfo.append(nameNodeProxies.get(i).proxyInfo);
> }
> combinedInfo.append(']');
> T wrappedProxy = (T) Proxy.newProxyInstance(
> ObserverReadInvocationHandler.class.getClassLoader(),
> new Class[] {xface}, new ObserverReadInvocationHandler());
> combinedProxy = new ProxyInfo<>(wrappedProxy, 
> combinedInfo.toString()){noformat}
> {{RetryInvocationHandler}} depends on the {{ProxyInfo}} to differentiate 
> between proxies while checking if failover from that proxy happened before. 
> And since combined proxy has only 1 proxy, HADOOP-17116 doesn't work on 
> {{ObserverReadProxyProvider.}}It would need to handled separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15664) Prevent Observer NameNode from becoming StandBy NameNode

2020-11-02 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224933#comment-17224933
 ] 

Chao Sun commented on HDFS-15664:
-

[~aihuaxu] seems this is already fixed by HDFS-14961?

> Prevent Observer NameNode from becoming StandBy NameNode
> 
>
> Key: HDFS-15664
> URL: https://issues.apache.org/jira/browse/HDFS-15664
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: auto-failover
>Affects Versions: 2.10.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> When the cluster performs a failover from NN1 to NN2, NN2 is asking all the 
> other NNs to cede active state and transit to StandBy including the Observer 
> NameNodes.
> Seems we should block Observer from becoming StandBy and participating in 
> Failover. Of course, since we can transit StandBy NameNode to Observer, we 
> can separately support promote Observer NameNode to StandBy NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15601) Batch listing: gracefully fallback to use non-batched listing when NameNode doesn't support the feature

2020-09-25 Thread Chao Sun (Jira)
Chao Sun created HDFS-15601:
---

 Summary: Batch listing: gracefully fallback to use non-batched 
listing when NameNode doesn't support the feature
 Key: HDFS-15601
 URL: https://issues.apache.org/jira/browse/HDFS-15601
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Chao Sun


HDFS-13616 requires both server and client side change. However, it is common 
that users use a newer client to talk to older HDFS (say 2.10). Currently the 
client will simply fail in this scenario. A better approach, perhaps, is to 
have client fallback to use non-batched listing on the input directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15516) Add info for create flags in NameNode audit logs

2020-09-15 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196410#comment-17196410
 ] 

Chao Sun commented on HDFS-15516:
-

I think this makes sense given that we already record flags for rename? this 
may break existing parsers who assume there is no flag for create/append op but 
not sure it should be a reason to block this though.

> Add info for create flags in NameNode audit logs
> 
>
> Key: HDFS-15516
> URL: https://issues.apache.org/jira/browse/HDFS-15516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Shashikant Banerjee
>Assignee: jianghua zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15516.001.patch, HDFS-15516.002.patch, 
> HDFS-15516.003.patch, HDFS-15516.004.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, if file create happens with flags like overwrite , the audit logs 
> doesn't seem to contain the info regarding the flags in the audit logs. It 
> would be useful to add info regarding the create options in the audit logs 
> similar to Rename ops. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) Support observer node from Router-Based Federation

2020-09-03 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190483#comment-17190483
 ] 

Chao Sun commented on HDFS-13522:
-

[~hemanthboyina] feel free to take over this. I haven't got a chance to work on 
this but I think it is an important feature. I may be able to help on code 
review.

> Support observer node from Router-Based Federation
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13522.001.patch, HDFS-13522_WIP.patch, RBF_ 
> Observer support.pdf, Router+Observer RPC clogging.png, 
> ShortTerm-Routers+Observer.png
>
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15502) Implement service-user feature in DecayRPCScheduler

2020-07-30 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167747#comment-17167747
 ] 

Chao Sun commented on HDFS-15502:
-

[~tasanuma] seems this JIRA is very similar to HADOOP-15016? this also should 
be a HADOOP jira rather than HDFS.

> Implement service-user feature in DecayRPCScheduler
> ---
>
> Key: HDFS-15502
> URL: https://issues.apache.org/jira/browse/HDFS-15502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> In our cluster, we want to use FairCallQueue to limit heavy users, but not 
> want to restrict certain users who are submitting important requests. This 
> jira proposes to implement the service-user feature that the user is always 
> scheduled high-priority queue.
> According to HADOOP-9640, the initial concept of FCQ has this feature, but 
> not implemented finally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport

2020-07-29 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167374#comment-17167374
 ] 

Chao Sun commented on HDFS-15014:
-

Thanks [~fengnanli]. Closing this as duplicate.

> RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport 
> -
>
> Key: HDFS-15014
> URL: https://issues.apache.org/jira/browse/HDFS-15014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chao Sun
>Priority: Major
>
> Currently the {{chooseDatanode}} call (which is shared by {{open}}, 
> {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls 
> {{getDatanodeReport}} from ALL downstream namenodes:
> {code}
>   private DatanodeInfo chooseDatanode(final Router router,
>   final String path, final HttpOpParam.Op op, final long openOffset,
>   final String excludeDatanodes) throws IOException {
> // We need to get the DNs as a privileged user
> final RouterRpcServer rpcServer = getRPCServer(router);
> UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
> RouterRpcServer.setCurrentUser(loginUser);
> DatanodeInfo[] dns = null;
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> ...
> {code}
> The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
> as it need to lock the {{DatanodeManager}} which is also shared by calls such 
> as processing heartbeats. Check HDFS-14366 for a similar issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport

2020-07-29 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-15014.
-
Resolution: Duplicate

> RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport 
> -
>
> Key: HDFS-15014
> URL: https://issues.apache.org/jira/browse/HDFS-15014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chao Sun
>Priority: Major
>
> Currently the {{chooseDatanode}} call (which is shared by {{open}}, 
> {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls 
> {{getDatanodeReport}} from ALL downstream namenodes:
> {code}
>   private DatanodeInfo chooseDatanode(final Router router,
>   final String path, final HttpOpParam.Op op, final long openOffset,
>   final String excludeDatanodes) throws IOException {
> // We need to get the DNs as a privileged user
> final RouterRpcServer rpcServer = getRPCServer(router);
> UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
> RouterRpcServer.setCurrentUser(loginUser);
> DatanodeInfo[] dns = null;
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> ...
> {code}
> The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
> as it need to lock the {{DatanodeManager}} which is also shared by calls such 
> as processing heartbeats. Check HDFS-14366 for a similar issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15465) Support WebHDFS accesses to the data stored in secure Datanode through insecure Namenode

2020-07-27 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-15465.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Support WebHDFS accesses to the data stored in secure Datanode through 
> insecure Namenode
> 
>
> Key: HDFS-15465
> URL: https://issues.apache.org/jira/browse/HDFS-15465
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: federation, webhdfs
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: webhdfs-federation.pdf
>
>
> We're federating a secure HDFS cluster with an insecure cluster.
> Using HDFS RPC, we can access the data managed by insecure Namenode and 
> stored in secure Datanode.
> However, it does not work for WebHDFS due to HadoopIllegalArgumentException.
> {code}
> $ curl -i "http://:/webhdfs/v1/?op=OPEN"
> HTTP/1.1 307 TEMPORARY_REDIRECT
> (omitted)
> Location: 
> http://:/webhdfs/v1/?op=OPEN==0
> $ curl -i 
> "http://:/webhdfs/v1/?op=OPEN==0"
> HTTP/1.1 400 Bad Request
> (omitted)
> {"RemoteException":{"exception":"HadoopIllegalArgumentException","javaClassName":"org.apache.hadoop.HadoopIllegalArgumentException","message":"Invalid
>  argument, newValue is null"}}
> {code}
> This is because secure Datanode expects a delegation token, but insecure 
> Namenode does not return it to a client.
> - org.apache.hadoop.security.token.Token.decodeWritable
> {code}
>   private static void decodeWritable(Writable obj,
>  String newValue) throws IOException {
> if (newValue == null) {
>   throw new HadoopIllegalArgumentException(
>   "Invalid argument, newValue is null");
> }
> {code}
> The issue proposes to support the access also for WebHDFS.
> The attached PDF file [^webhdfs-federation.pdf] depicts our current 
> architecture and proposal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2020-06-19 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15423:

Component/s: webhdfs

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Priority: Major
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2020-06-19 Thread Chao Sun (Jira)
Chao Sun created HDFS-15423:
---

 Summary: RBF: WebHDFS create shouldn't choose DN from all 
sub-clusters
 Key: HDFS-15423
 URL: https://issues.apache.org/jira/browse/HDFS-15423
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Reporter: Chao Sun


In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} first 
gets all DNs via {{getDatanodeReport}}, and then randomly pick one from the 
list via {{getRandomDatanode}}. This logic doesn't seem correct as it should 
pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15417) Lazy get the datanode report for federation WebHDFS operations

2020-06-18 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139554#comment-17139554
 ] 

Chao Sun commented on HDFS-15417:
-

I think this addresses the same issue in HDFS-15014. Internally we were trying 
to use the cached DN reports but those are tied with Router metrics and the 
implementation is kind of messy.

> Lazy get the datanode report for federation WebHDFS operations
> --
>
> Key: HDFS-15417
> URL: https://issues.apache.org/jira/browse/HDFS-15417
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: federation, rbf, webhdfs
>Reporter: Ye Ni
>Assignee: Ye Ni
>Priority: Minor
>
> *Why*
>  For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or 
> namenode needs to get the datanodes where the block is located, then redirect 
> the request to one of the datanodes.
> However, this chooseDatanode action in router is much slower than namenode, 
> which directly affects the WebHDFS operations above.
> For namenode WebHDFS, it normally takes tens of milliseconds, while router 
> always takes more than 2 seconds.
> *How*
>  Only get the datanode report when necessary in router. It is a very expense 
> operation where all the time is spent on.
> This is only needed when we want to exclude some datanodes or find a random 
> datanode for CREATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14320) Support skipTrash for WebHDFS

2020-05-28 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118235#comment-17118235
 ] 

Chao Sun edited comment on HDFS-14320 at 5/28/20, 10:16 AM:


Bumping up this as this seems to be an important feature. Curious what is the 
current status [~kpalanisamy], [~weichiu].


was (Author: csun):
Bumping up this as this seems to be an important feature. Curious what is the 
current status [~weichiu].

> Support skipTrash for WebHDFS 
> --
>
> Key: HDFS-14320
> URL: https://issues.apache.org/jira/browse/HDFS-14320
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, webhdfs
>Affects Versions: 3.2.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
> Attachments: HDFS-14320-001.patch, HDFS-14320-002.patch, 
> HDFS-14320-003.patch, HDFS-14320-004.patch, HDFS-14320-005.patch, 
> HDFS-14320-006.patch, HDFS-14320-007.patch, HDFS-14320-008.patch
>
>
> Files/Directories deleted via webhdfs rest call doesn't use the skiptrash 
> feature, it would be deleted permanently. This feature is very important us 
> because our user has deleted large directory accidentally.
> By default, Skiptrash option is set to true, skiptrash=true. Any files, Using 
> CURL will be permanently deleted.
> Example:
> curl -iv -X DELETE 
> "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true;
>  
> Use skiptrash=false, to move files to trash Instead.
> Example:
> curl -iv -X DELETE 
> "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true=false;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14320) Support skipTrash for WebHDFS

2020-05-27 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118235#comment-17118235
 ] 

Chao Sun commented on HDFS-14320:
-

Bumping up this as this seems to be an important feature. Curious what is the 
current status [~weichiu].

> Support skipTrash for WebHDFS 
> --
>
> Key: HDFS-14320
> URL: https://issues.apache.org/jira/browse/HDFS-14320
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode, webhdfs
>Affects Versions: 3.2.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
> Attachments: HDFS-14320-001.patch, HDFS-14320-002.patch, 
> HDFS-14320-003.patch, HDFS-14320-004.patch, HDFS-14320-005.patch, 
> HDFS-14320-006.patch, HDFS-14320-007.patch, HDFS-14320-008.patch
>
>
> Files/Directories deleted via webhdfs rest call doesn't use the skiptrash 
> feature, it would be deleted permanently. This feature is very important us 
> because our user has deleted large directory accidentally.
> By default, Skiptrash option is set to true, skiptrash=true. Any files, Using 
> CURL will be permanently deleted.
> Example:
> curl -iv -X DELETE 
> "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true;
>  
> Use skiptrash=false, to move files to trash Instead.
> Example:
> curl -iv -X DELETE 
> "http://:50070/webhdfs/v1/tmp/sampledata?op=DELETE=hdfs=true=false;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-05-19 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15259:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Reduce useless log information in FSNamesystemAuditLogger
> -
>
> Key: HDFS-15259
> URL: https://issues.apache.org/jira/browse/HDFS-15259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15259.001.patch, HDFS-15259.002.patch
>
>
> For most operations, the 'dst' is null, add checking before logging the 'dst' 
> information in FSNamesystemAuditLogger
> {code:java}
> 2020-04-03 16:34:40,021 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,329 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,362 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null 
> perm=yang:supergroup:rwxr-xr-x proto=rpc{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-05-19 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111456#comment-17111456
 ] 

Chao Sun commented on HDFS-15259:
-

Yup. I think it is a won't fix.

> Reduce useless log information in FSNamesystemAuditLogger
> -
>
> Key: HDFS-15259
> URL: https://issues.apache.org/jira/browse/HDFS-15259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15259.001.patch, HDFS-15259.002.patch
>
>
> For most operations, the 'dst' is null, add checking before logging the 'dst' 
> information in FSNamesystemAuditLogger
> {code:java}
> 2020-04-03 16:34:40,021 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,329 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,362 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null 
> perm=yang:supergroup:rwxr-xr-x proto=rpc{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15335) Report top N metrics for files in get listing ops

2020-05-05 Thread Chao Sun (Jira)
Chao Sun created HDFS-15335:
---

 Summary: Report top N metrics for files in get listing ops
 Key: HDFS-15335
 URL: https://issues.apache.org/jira/browse/HDFS-15335
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, metrics
Reporter: Chao Sun


Currently HDFS has {{filesInGetListingOps}} metrics which tells the total 
number of files in all listing ops. However, it will be useful to report the 
top N users who contribute most to this. This can help to identify the 
potential bad users and stop the abusing against NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15259) Reduce useless log information in FSNamesystemAuditLogger

2020-04-03 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074718#comment-17074718
 ] 

Chao Sun commented on HDFS-15259:
-

[~hadoop_yangyun] I don't think you can do this as many applications depend on 
the tabular format for parsing audit log and it will break them badly.

> Reduce useless log information in FSNamesystemAuditLogger
> -
>
> Key: HDFS-15259
> URL: https://issues.apache.org/jira/browse/HDFS-15259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15259.001.patch
>
>
> For most operations, the 'dst' is null, add checking before logging the 'dst' 
> information in FSNamesystemAuditLogger
> {code:java}
> 2020-04-03 16:34:40,021 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=listStatus src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,329 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/ dst=null perm=null 
> proto=rpc
> 2020-04-03 16:35:16,362 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true 
> ugi=user (auth:SIMPLE) ip=/127.0.0.1 cmd=mkdirs src=/user dst=null 
> perm=yang:supergroup:rwxr-xr-x proto=rpc{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15197) [SBN read] Change ObserverRetryOnActiveException log to debug

2020-02-27 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15197:

Summary: [SBN read] Change ObserverRetryOnActiveException log to debug  
(was: Change ObserverRetryOnActiveException log to debug)

> [SBN read] Change ObserverRetryOnActiveException log to debug
> -
>
> Key: HDFS-15197
> URL: https://issues.apache.org/jira/browse/HDFS-15197
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-15197.001.patch
>
>
> Currently in ObserverReadProxyProvider, when a ObserverRetryOnActiveException 
> happens, ObserverReadProxyProvider logs a message at INFO level. This can be 
> a large volume of logs in some scenarios. For example, when some job tries to 
> access lots of files that haven't been accessed for a long time, all these 
> accesses may trigger atime updates, which led to 
> ObserverRetryOnActiveException. We should change this log to DEBUG.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15197) Change ObserverRetryOnActiveException log to debug

2020-02-27 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047081#comment-17047081
 ] 

Chao Sun commented on HDFS-15197:
-

+1

> Change ObserverRetryOnActiveException log to debug
> --
>
> Key: HDFS-15197
> URL: https://issues.apache.org/jira/browse/HDFS-15197
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-15197.001.patch
>
>
> Currently in ObserverReadProxyProvider, when a ObserverRetryOnActiveException 
> happens, ObserverReadProxyProvider logs a message at INFO level. This can be 
> a large volume of logs in some scenarios. For example, when some job tries to 
> access lots of files that haven't been accessed for a long time, all these 
> accesses may trigger atime updates, which led to 
> ObserverRetryOnActiveException. We should change this log to DEBUG.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly

2020-02-27 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047074#comment-17047074
 ] 

Chao Sun edited comment on HDFS-15196 at 2/28/20 12:05 AM:
---

+1. Patch LGTM but will be great if [~elgoiri] or others who're familiar with 
RBF can take a look.


was (Author: csun):
Patch LGTM but will be great if [~elgoiri] or others who're familiar with RBF 
can take a look.

> RouterRpcServer getListing cannot list large dirs correctly
> ---
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly

2020-02-27 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047074#comment-17047074
 ] 

Chao Sun commented on HDFS-15196:
-

Patch LGTM but will be great if [~elgoiri] or others who're familiar with RBF 
can take a look.

> RouterRpcServer getListing cannot list large dirs correctly
> ---
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly

2020-02-27 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047069#comment-17047069
 ] 

Chao Sun commented on HDFS-15196:
-

Thanks Fengnan for the patch. Raising this to Critical since it is a 
correctness issue.

> RouterRpcServer getListing cannot list large dirs correctly
> ---
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15196) RouterRpcServer getListing cannot list large dirs correctly

2020-02-27 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15196:

Priority: Critical  (was: Major)

> RouterRpcServer getListing cannot list large dirs correctly
> ---
>
> Key: HDFS-15196
> URL: https://issues.apache.org/jira/browse/HDFS-15196
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Critical
> Attachments: HDFS-15196.001.patch
>
>
> In RouterRpcServer, getListing function is handled as two parts:
>  # Union all partial listings from destination ns + paths
>  # Append mount points for the dir to be listed
> In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT 
> (with default value 1k), the batch listing will be used and the startAfter 
> will be used to define the boundary of each batch listing. However, step 2 
> here will add existing mount points, which will mess up with the boundary of 
> the batch, thus making the next batch startAfter wrong.
> The fix is just to append the mount points when there is no more batch query 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992018#comment-16992018
 ] 

Chao Sun commented on HDFS-15036:
-

[~vagarychen] sorry for grabbing this JIRA too soon :) Since you have done much 
study on this, do you want to take this JIRA instead?

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-12-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15017:

Status: Patch Available  (was: Open)

> Remove redundant import of AtomicBoolean in NameNodeConnector.
> --
>
> Key: HDFS-15017
> URL: https://issues.apache.org/jira/browse/HDFS-15017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, hdfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-15017-branch-2.000.patch
>
>
> Should remove redundant import.
> Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-12-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15017:

Attachment: HDFS-15017-branch-2.000.patch

> Remove redundant import of AtomicBoolean in NameNodeConnector.
> --
>
> Key: HDFS-15017
> URL: https://issues.apache.org/jira/browse/HDFS-15017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, hdfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-15017-branch-2.000.patch
>
>
> Should remove redundant import.
> Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-12-06 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990201#comment-16990201
 ] 

Chao Sun commented on HDFS-15017:
-

Seems like a trivial change - the import was added by HDFS-7073

> Remove redundant import of AtomicBoolean in NameNodeConnector.
> --
>
> Key: HDFS-15017
> URL: https://issues.apache.org/jira/browse/HDFS-15017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, hdfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-15017-branch-2.000.patch
>
>
> Should remove redundant import.
> Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-12-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15017:

Attachment: (was: HDFS-15017-branch-2.000.patch)

> Remove redundant import of AtomicBoolean in NameNodeConnector.
> --
>
> Key: HDFS-15017
> URL: https://issues.apache.org/jira/browse/HDFS-15017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, hdfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>  Labels: newbie
>
> Should remove redundant import.
> Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-12-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15017:

Attachment: HDFS-15017-branch-2.000.patch

> Remove redundant import of AtomicBoolean in NameNodeConnector.
> --
>
> Key: HDFS-15017
> URL: https://issues.apache.org/jira/browse/HDFS-15017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, hdfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>  Labels: newbie
>
> Should remove redundant import.
> Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-15036:
---

Assignee: Chao Sun

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14963) Add HDFS Client machine caching active namenode index mechanism.

2019-12-06 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990087#comment-16990087
 ] 

Chao Sun commented on HDFS-14963:
-

Seems this and HDFS-15024 are solving very similar problems, and the solution 
there could be much simpler. Should we instead pursue that approach? I also 
tend to echo [~shv]'s point and not sure having clients to write to local file 
is a good idea.

> Add HDFS Client machine caching active namenode index mechanism.
> 
>
> Key: HDFS-14963
> URL: https://issues.apache.org/jira/browse/HDFS-14963
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.1.3
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>  Labels: multi-sbnn
>
> In multi-NameNodes scenery, a new hdfs client always begins a rpc call from 
> the 1st namenode, simply polls, and finally determines the current Active 
> namenode. 
> This brings at least two problems:
>  # Extra failover consumption, especially in the case of frequent creation of 
> clients.
>  # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}
> We can introduce a solution for this problem: in client machine, for every 
> hdfs cluster, caching its current Active NameNode index in a separate cache 
> file named by its uri. *Note these cache files are shared by all hdfs client 
> processes on this machine*.
> For example, suppose there are hdfs://ns1 and hdfs://ns2, and the client 
> machine cache file directory is /tmp, then:
>  # the ns1 cluster related cache file is /tmp/ns1
>  # the ns2 cluster related cache file is /tmp/ns2
> And then:
>  #  When a client starts, it reads the current Active NameNode index from the 
> corresponding cache file based on the target hdfs uri, and then directly make 
> an rpc call toward the right ANN.
>  #  After each time client failovers, it need to write the latest Active 
> NameNode index to the corresponding cache file based on the target hdfs uri.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time

2019-12-06 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990059#comment-16990059
 ] 

Chao Sun commented on HDFS-15024:
-

{quote}
Chao Sun I think the msync case is just a case, maybe the current problem is a 
common problem for Support more than 2 NameNodes?
{quote}

yes you are correct. This is a more general problem for multi-sbn feature but I 
think we could optimize {{msync}} specifically to avoid the retry backoff. 

Regarding patch v1, seems it only handles the first few retries and later on 
when {{times}} gradually increment to passes beyond {{numNameNodes - 1 }}, it 
will still do exponential backoff on all the SBNs.

> [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a 
> condition of calculation of sleep time
> ---
>
> Key: HDFS-15024
> URL: https://issues.apache.org/jira/browse/HDFS-15024
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.10.0, 3.3.0, 3.2.1
>Reporter: huhaiyang
>Assignee: huhaiyang
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-15024.001.patch, client_error.log
>
>
> When we enable the ONN , there will be three NN nodes for the client 
> configuration,
> Such as configuration
> 
> dfs.ha.namenodes.ns1
> nn2,nn3,nn1
> 
> Currently, 
> nn2 is in standby state
> nn3 is in observer state 
> nn1 is in active state
> When the user performs an access HDFS operation
> ./bin/hadoop --loglevel debug fs 
> -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
>  -mkdir /user/haiyang1/test8
> You need to request nn1 when you execute the msync method,
> Actually connect nn2 first and failover is required
> In connection nn3 does not meet the requirements, failover needs to be 
> performed, but at this time, failover operation needs to be performed during 
> a period of hibernation
> Finally, it took a period of hibernation to connect the successful request to 
> nn1
> In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current 
> default implementation is Sleep time is calculated when more than one 
> failover operation is performed
> I think that the Number of NameNodes as a condition of calculation of sleep 
> time is more reasonable
> That is, in the current test, executing failover on connection nn3 does not 
> need to sleep time to directly connect to the next nn node
> See client_error.log for details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15005) Backport HDFS-12300 to branch-2

2019-12-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15005:

Attachment: HDFS-15005-branch-2.003.patch

> Backport HDFS-12300 to branch-2
> ---
>
> Key: HDFS-15005
> URL: https://issues.apache.org/jira/browse/HDFS-15005
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-15005-branch-2.000.patch, 
> HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, 
> HDFS-15005-branch-2.003.patch
>
>
> Having DT related information is very useful in audit log. This tracks effort 
> to backport HDFS-12300 to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15005) Backport HDFS-12300 to branch-2

2019-12-06 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990023#comment-16990023
 ] 

Chao Sun commented on HDFS-15005:
-

Rebased to the latest branch-2. [~weichiu] pls take a look.

> Backport HDFS-12300 to branch-2
> ---
>
> Key: HDFS-15005
> URL: https://issues.apache.org/jira/browse/HDFS-15005
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-15005-branch-2.000.patch, 
> HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, 
> HDFS-15005-branch-2.003.patch
>
>
> Having DT related information is very useful in audit log. This tracks effort 
> to backport HDFS-12300 to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14998) [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130

2019-12-05 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989047#comment-16989047
 ] 

Chao Sun commented on HDFS-14998:
-

+1 on v006 as well.

> [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130
> -
>
> Key: HDFS-14998
> URL: https://issues.apache.org/jira/browse/HDFS-14998
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, 
> HDFS-14998.003.patch, HDFS-14998.004.patch, HDFS-14998.005.patch, 
> HDFS-14998.006.patch
>
>
> After HDFS-14130, we should update observer namenode doc, observer namenode 
> can run with ZKFC running



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >