[jira] [Commented] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

2024-04-24 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840312#comment-17840312
 ] 

Takanobu Asanuma commented on HDFS-1:
-

[~chuanjie.duan]  ConnectException is an instance of SocketException. 
Therefore, the if statement can still catch a ConnectException.

> RBF: Refresh cacheNS when SocketException occurs
> 
>
> Key: HDFS-1
> URL: https://issues.apache.org/jira/browse/HDFS-1
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.1, 3.4.0
> Environment: HDFS 3.3.0, Java 11
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Problem:
> When active NameNode is restarted and loading fsimage, DFSRouters 
> significantly slow down.
> Investigation:
> When active NameNode is restarted and loading fsimage, RouterRpcClient 
> receives SocketException. Since 
> RouterRpcClient#isUnavailableException(IOException) returns false when the 
> argument is SocketException, the MembershipNameNodeResolver#cacheNS is not 
> refreshed. That's why the order of the NameNodes returned by 
> MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged 
> and the active NameNode is still returned first. Therefore RouterRpcClient 
> still tries to connect to the NameNode that is loading fsimage.
> After loading the fsimage, the NameNode throws StandbyException. The 
> exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.
> Workaround:
> Stop NameNode and wait 1 minute before starting NameNode instead of 
> restarting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17468) Update ISA-L to 2.31.0 in the build image

2024-04-15 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17468:
---

 Summary: Update ISA-L to 2.31.0 in the build image
 Key: HDFS-17468
 URL: https://issues.apache.org/jira/browse/HDFS-17468
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


Intel ISA-L has several improvements in version 2.31.0. Let's update ISA-L in 
our build image to this version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17435) Fix TestRouterRpc failed

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17435:

Fix Version/s: 3.4.1

> Fix TestRouterRpc failed
> 
>
> Key: HDFS-17435
> URL: https://issues.apache.org/jira/browse/HDFS-17435
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> TestRouterRpc and TestRouterRpcMultiDestination are failing with the 
> following error.
> {noformat}
> [ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: jenkins is not allowed to impersonate jenkins
> {noformat}
> This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
> implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17354:

Fix Version/s: 3.4.1

> Delay invoke  clearStaleNamespacesInRouterStateIdContext during router start 
> up
> ---
>
> Key: HDFS-17354
> URL: https://issues.apache.org/jira/browse/HDFS-17354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> We should  start clear expired namespace thread at  RouterRpcServer RUNNING 
> phase  because StateStoreService is Initialized in  initialization phase.  
> Now, router will throw IoException when start up.
> {panel:title=Exception}
> 2024-01-09 16:27:06,939 WARN 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
> fetch current list of namespaces.
> java.io.IOException: State Store does not have an interface for 
> MembershipStore
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {panel}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17441) Fix junit dependency by adding missing library in hadoop-hdfs-rbf

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17441:

Fix Version/s: 3.4.1

> Fix junit dependency by adding missing library in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17441
> URL: https://issues.apache.org/jira/browse/HDFS-17441
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> We need to add some missing junit libraries in hadoop-hdfs-rbf.
> See: 
> https://issues.apache.org/jira/browse/HDFS-17370?focusedCommentId=17829747=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17441) Fix junit dependency by adding missing library in hadoop-hdfs-rbf

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17441:

Fix Version/s: 3.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix junit dependency by adding missing library in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17441
> URL: https://issues.apache.org/jira/browse/HDFS-17441
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> We need to add some missing junit libraries in hadoop-hdfs-rbf.
> See: 
> https://issues.apache.org/jira/browse/HDFS-17370?focusedCommentId=17829747=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17435) Fix TestRouterRpc failed

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17435.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Fix TestRouterRpc failed
> 
>
> Key: HDFS-17435
> URL: https://issues.apache.org/jira/browse/HDFS-17435
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> TestRouterRpc and TestRouterRpcMultiDestination are failing with the 
> following error.
> {noformat}
> [ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: jenkins is not allowed to impersonate jenkins
> {noformat}
> This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
> implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17441) Fix junit dependency by adding missing library in hadoop-hdfs-rbf

2024-03-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17441:

Status: Patch Available  (was: Open)

> Fix junit dependency by adding missing library in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17441
> URL: https://issues.apache.org/jira/browse/HDFS-17441
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> We need to add some missing junit libraries in hadoop-hdfs-rbf.
> See: 
> https://issues.apache.org/jira/browse/HDFS-17370?focusedCommentId=17829747=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17441) Fix junit dependency by adding missing library in hadoop-hdfs-rbf

2024-03-25 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17441:
---

 Summary: Fix junit dependency by adding missing library in 
hadoop-hdfs-rbf
 Key: HDFS-17441
 URL: https://issues.apache.org/jira/browse/HDFS-17441
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


We need to add some missing junit libraries in hadoop-hdfs-rbf.

See: 
https://issues.apache.org/jira/browse/HDFS-17370?focusedCommentId=17829747=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17829747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17435) Fix TestRouterRpc failed

2024-03-24 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17435:

Summary: Fix TestRouterRpc failed  (was: Fix 
TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed)

> Fix TestRouterRpc failed
> 
>
> Key: HDFS-17435
> URL: https://issues.apache.org/jira/browse/HDFS-17435
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> TestRouterRpc and TestRouterRpcMultiDestination are failing with the 
> following error.
> {noformat}
> [ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: jenkins is not allowed to impersonate jenkins
> {noformat}
> This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
> implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-22 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829873#comment-17829873
 ] 

Takanobu Asanuma commented on HDFS-17370:
-

Thanks again for sharing the problem, [~ayushtkn]. I will create another jira 
and PR.

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-21 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829437#comment-17829437
 ] 

Takanobu Asanuma commented on HDFS-17370:
-

The problem is fixed by HDFS-17432.

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-03-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17354.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Delay invoke  clearStaleNamespacesInRouterStateIdContext during router start 
> up
> ---
>
> Key: HDFS-17354
> URL: https://issues.apache.org/jira/browse/HDFS-17354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> We should  start clear expired namespace thread at  RouterRpcServer RUNNING 
> phase  because StateStoreService is Initialized in  initialization phase.  
> Now, router will throw IoException when start up.
> {panel:title=Exception}
> 2024-01-09 16:27:06,939 WARN 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
> fetch current list of namespaces.
> java.io.IOException: State Store does not have an interface for 
> MembershipStore
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {panel}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17435) Fix TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed

2024-03-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reassigned HDFS-17435:
---

Assignee: Takanobu Asanuma

> Fix TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed
> -
>
> Key: HDFS-17435
> URL: https://issues.apache.org/jira/browse/HDFS-17435
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> TestRouterRpc and TestRouterRpcMultiDestination are failing with the 
> following error.
> {noformat}
> [ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: jenkins is not allowed to impersonate jenkins
> {noformat}
> This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
> implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-21 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17432.
-
Fix Version/s: 3.4.1
   3.5.0
   Resolution: Fixed

> Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17432
> URL: https://issues.apache.org/jira/browse/HDFS-17432
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
> both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to 
> the hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17435) Fix TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed

2024-03-20 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17435:
---

 Summary: Fix 
TestRouterRpc#testClearStaleNamespacesInRouterStateIdContext() failed
 Key: HDFS-17435
 URL: https://issues.apache.org/jira/browse/HDFS-17435
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma


TestRouterRpc and TestRouterRpcMultiDestination are failing with the following 
error.
{noformat}
[ERROR] testProxyGetBlockKeys  Time elapsed: 0.573 s  <<< ERROR!
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 User: jenkins is not allowed to impersonate jenkins
{noformat}
This is caused by testClearStaleNamespacesInRouterStateIdContext() which is 
implemented by HDFS-17354.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17432:
---

 Summary: Fix junit dependency to enable JUnit4 tests to run in 
hadoop-hdfs-rbf
 Key: HDFS-17432
 URL: https://issues.apache.org/jira/browse/HDFS-17432
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to the 
hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-18 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828144#comment-17828144
 ] 

Takanobu Asanuma commented on HDFS-17370:
-

[~ayushtkn] Thank you for investigating. It indeed appears that this change is 
the cause of the issue, my apologies for that.
It seems like we need to add junit-vintage-engine to hadoop-hdfs-rbf/pom.xml. 
I'll create a PR later.

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17333) DFSClient supports lazy resolution from hostname to IP.

2024-03-02 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17333:

Fix Version/s: 3.4.1
   3.5.0

> DFSClient supports lazy resolution from hostname to IP.
> ---
>
> Key: HDFS-17333
> URL: https://issues.apache.org/jira/browse/HDFS-17333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: HDFS-17333.001.patch
>
>
> Currently, when dfsclient is started, it will resolve all hosts of all 
> namservices: 
>   at DFSUtilClient#getAddresses(conf, null, addressKey)
>   at AbstractNNFailoverProxyProvider#getProxyAddresses(URI uri, 
> String addressKey)
> If the current environment where the dfsClient is located causes resolution 
> of host->ip to be very slow, the existing logic will undoubtedly take a long 
> time when there are too many nameservices.
> Now, each dfsclient only needs the IPs of all namenodes of a certain 
> nameservice at most. A better situation is that if the namenode selected by 
> dfsclient for the first time can provide the required services normally, then 
> the client only needs to know the IP of this namenode. Therefore, it is not 
> necessary to resolve all namenodes of all nameservices in the configuration 
> file, when dfsclient is started.
> This patch supports lazy resolution of host->ip, which will only be resolved 
> when the host needs to be accessed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17333) DFSClient supports lazy resolution from hostname to IP.

2024-03-02 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17333:

Summary: DFSClient supports lazy resolution from hostname to IP.  (was: 
DFSClient support lazy resolve host->ip.)

> DFSClient supports lazy resolution from hostname to IP.
> ---
>
> Key: HDFS-17333
> URL: https://issues.apache.org/jira/browse/HDFS-17333
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-17333.001.patch
>
>
> Currently, when dfsclient is started, it will resolve all hosts of all 
> namservices: 
>   at DFSUtilClient#getAddresses(conf, null, addressKey)
>   at AbstractNNFailoverProxyProvider#getProxyAddresses(URI uri, 
> String addressKey)
> If the current environment where the dfsClient is located causes resolution 
> of host->ip to be very slow, the existing logic will undoubtedly take a long 
> time when there are too many nameservices.
> Now, each dfsclient only needs the IPs of all namenodes of a certain 
> nameservice at most. A better situation is that if the namenode selected by 
> dfsclient for the first time can provide the required services normally, then 
> the client only needs to know the IP of this namenode. Therefore, it is not 
> necessary to resolve all namenodes of all nameservices in the configuration 
> file, when dfsclient is started.
> This patch supports lazy resolution of host->ip, which will only be resolved 
> when the host needs to be accessed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17361) DiskBalancer: Query command support with multiple nodes

2024-02-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17361.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> DiskBalancer: Query command support with multiple nodes
> ---
>
> Key: HDFS-17361
> URL: https://issues.apache.org/jira/browse/HDFS-17361
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, diskbalancer
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> For: https://issues.apache.org/jira/browse/HDFS-10821 mentioned, Query 
> command will support with multiple nodes.
> That means we can use command hdfs diskbalancer -query to print one or one 
> more datanodes status of the diskbalancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17362) RBF: Implement RouterObserverReadConfiguredFailoverProxyProvider

2024-02-12 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17362:

Fix Version/s: 3.4.1
   3.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> RBF: Implement RouterObserverReadConfiguredFailoverProxyProvider
> 
>
> Key: HDFS-17362
> URL: https://issues.apache.org/jira/browse/HDFS-17362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
> while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
> we are to align RouterObserverReadProxyProvider with 
> ObserverReadProxyProvider, RouterObserverReadProxyProvider should internally 
> use ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has 
> an issue with resolving HA configurations. (For example, 
> IPFailoverProxyProvider cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17362) RBF: Implement RouterObserverReadConfiguredFailoverProxyProvider

2024-02-07 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17362:

Summary: RBF: Implement RouterObserverReadConfiguredFailoverProxyProvider  
(was: RBF: RouterObserverReadProxyProvider should use 
ConfiguredFailoverProxyProvider internally)

> RBF: Implement RouterObserverReadConfiguredFailoverProxyProvider
> 
>
> Key: HDFS-17362
> URL: https://issues.apache.org/jira/browse/HDFS-17362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
> while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
> we are to align RouterObserverReadProxyProvider with 
> ObserverReadProxyProvider, RouterObserverReadProxyProvider should internally 
> use ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has 
> an issue with resolving HA configurations. (For example, 
> IPFailoverProxyProvider cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-02-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17370:

Fix Version/s: 3.4.1
   3.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-02-02 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17370:

Status: Patch Available  (was: Open)

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-02-02 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17370:
---

 Summary: Fix junit dependency for running parameterized tests in 
hadoop-hdfs-rbf
 Key: HDFS-17370
 URL: https://issues.apache.org/jira/browse/HDFS-17370
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


We need to add junit-jupiter-engine dependency for running parameterized tests 
in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17359) EC: recheck failed streamers should only after flushing all packets.

2024-02-01 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17359.
-
Fix Version/s: 3.3.9
   3.4.1
   3.5.0
   Resolution: Fixed

> EC: recheck failed streamers should only after flushing all packets.
> 
>
> Key: HDFS-17359
> URL: https://issues.apache.org/jira/browse/HDFS-17359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.4.1, 3.5.0
>
>
> In method DFSStripedOutputStream#checkStreamerFailures, we have below codes:
> {code:java}
>     Set newFailed = checkStreamers();
>     if (newFailed.size() == 0) {
>       return;
>     }    if (isNeedFlushAllPackets) {
>       // for healthy streamers, wait till all of them have fetched the new 
> block
>       // and flushed out all the enqueued packets.
>       flushAllInternals();
>     }
>     // recheck failed streamers again after the flush
>     newFailed = checkStreamers(); {code}
> We should better move the re-check logic into if condition to reduce useless 
> invocation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-17348) Enhance Log when checkLocations in RecoveryTaskStriped

2024-01-30 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reopened HDFS-17348:
-

> Enhance Log when checkLocations in RecoveryTaskStriped
> --
>
> Key: HDFS-17348
> URL: https://issues.apache.org/jira/browse/HDFS-17348
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> Enhance IOE log to better debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17348) Enhance Log when checkLocations in RecoveryTaskStriped

2024-01-30 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17348.
-
Resolution: Duplicate

I'd like to change the status to duplicate if HDFS-17358 fixes the issue.

> Enhance Log when checkLocations in RecoveryTaskStriped
> --
>
> Key: HDFS-17348
> URL: https://issues.apache.org/jira/browse/HDFS-17348
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> Enhance IOE log to better debug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17356) RBF: Add Configuration dfs.federation.router.ns.name Optimization

2024-01-29 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811863#comment-17811863
 ] 

Takanobu Asanuma commented on HDFS-17356:
-

[~bigdata_zoodev] Thanks for your reply.
In our company, we also run Router and NameNode on the same host, and the error 
does not occur if we manually set dfs.ha.namenode.id. If my understanding is 
correct, 'programmatically' does not mean it prohibits manual configuration 
setting. As stated in hdfs-default.xml, it is automatically determined {*}if 
not configured{*}: 
[https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]

The RBF docs also state that 'if the local node is in a HA mode, it is 
recommended to configure dfs.ha.namenode.id.': 
[https://apache.github.io/hadoop/hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html]

It appears that the same discussion occurred in HDFS-13214. Please refer to 
that as well.

> RBF: Add Configuration dfs.federation.router.ns.name Optimization
> -
>
> Key: HDFS-17356
> URL: https://issues.apache.org/jira/browse/HDFS-17356
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfs, rbf
>Reporter: wangzhihui
>Priority: Minor
> Attachments: image-2024-01-29-18-04-55-391.png, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
>     When enabling RBF federation in HDFS, when the HDFS server and RBFClient 
> share the same configuration and the HDFS server (NameNode、ZKFC) and 
> RBFClient are on the same node, the following exception occurs, causing 
> NameNode to fail to start; The reason is that the NS of the Router service 
> has been added to the dfs.nameservices list. When NameNode starts, it obtains 
> the NS that the current node belongs to. However, it is found that there are 
> multiple NS that cannot be recognized and cannot pass the verification of 
> existing logic, ultimately resulting in NameNode startup failure. Currently, 
> we can only solve this problem by isolating the hdfs-site.xml of RouterClient 
> and NameNode. However, grouping configuration is not conducive to our unified 
> management of cluster configuration. Therefore, we propose a new solution to 
> solve this problem better.
> {code:java}
> // code placeholder
> 2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> registered UNIX signal handlers for [TERM, HUP, INT]
> 2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> createNameNode []
> 2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> Loaded properties from hadoop-metrics2.properties
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
> started
> 2023-10-30 15:53:24,868 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
>         at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257)
>         at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158)
>         at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1005)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:995)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
> 2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has 
> multiple addresses that match local node's address. Please configure the 
> system with dfs.nameservice.id and dfs.ha.name
> node.id
> 2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: {code}
>  
> hdfs-site.xml
> {code:java}
> // code placeholder
> 
>   dfs.nameservices
>   mycluster1,mycluster2,ns-fed
> 
>   dfs.ha.namenodes.ns-fed
>   r1
> 
> 
>   dfs.namenode.rpc-address.ns-fed.r1
>   node1.com:
> 
> 
>   dfs.ha.namenodes.mycluster1
>   nn1,nn2
> 
> 
>   dfs.namenode.http-address.mycluster1.nn1
>   node1.com:50070
> 
> 
>   dfs.namenode.http-address.mycluster1.nn2
>   node2.com:50070
> 
>   dfs.ha.namenodes.mycluster2
>   nn1,nn2
> 
> 
>   

[jira] [Commented] (HDFS-17356) RBF: Add Configuration dfs.federation.router.ns.name Optimization

2024-01-29 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811768#comment-17811768
 ] 

Takanobu Asanuma commented on HDFS-17356:
-

As the error log shows, dfs.nameservice.id and dfs.ha.namenode.id may not be 
set properly. Did you set dfs.ha.namenode.id=nn1 on node1.com and 
dfs.ha.namenode.id=nn2 on node2.com?

> RBF: Add Configuration dfs.federation.router.ns.name Optimization
> -
>
> Key: HDFS-17356
> URL: https://issues.apache.org/jira/browse/HDFS-17356
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfs, rbf
>Reporter: wangzhihui
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
>     When enabling RBF federation in HDFS, when the HDFS server and RBFClient 
> share the same configuration and the HDFS server (NameNode、ZKFC) and 
> RBFClient are on the same node, the following exception occurs, causing 
> NameNode to fail to start; The reason is that the NS of the Router service 
> has been added to the dfs.nameservices list. When NameNode starts, it obtains 
> the NS that the current node belongs to. However, it is found that there are 
> multiple NS that cannot be recognized and cannot pass the verification of 
> existing logic, ultimately resulting in NameNode startup failure. Currently, 
> we can only solve this problem by isolating the hdfs-site.xml of RouterClient 
> and NameNode. However, grouping configuration is not conducive to our unified 
> management of cluster configuration. Therefore, we propose a new solution to 
> solve this problem better.
> {code:java}
> // code placeholder
> 2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> registered UNIX signal handlers for [TERM, HUP, INT]
> 2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> createNameNode []
> 2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> Loaded properties from hadoop-metrics2.properties
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
> started
> 2023-10-30 15:53:24,868 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
>         at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257)
>         at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158)
>         at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1005)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:995)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
> 2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has 
> multiple addresses that match local node's address. Please configure the 
> system with dfs.nameservice.id and dfs.ha.name
> node.id
> 2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: {code}
>  
> hdfs-site.xml
> {code:java}
> // code placeholder
> 
>   dfs.nameservices
>   mycluster1,mycluster2,ns-fed
> 
>   dfs.ha.namenodes.ns-fed
>   r1
> 
> 
>   dfs.namenode.rpc-address.ns-fed.r1
>   node1.com:
> 
> 
>   dfs.ha.namenodes.mycluster1
>   nn1,nn2
> 
> 
>   dfs.namenode.http-address.mycluster1.nn1
>   node1.com:50070
> 
> 
>   dfs.namenode.http-address.mycluster1.nn2
>   node2.com:50070
> 
>   dfs.ha.namenodes.mycluster2
>   nn1,nn2
> 
> 
>   dfs.namenode.http-address.mycluster2.nn1
>   node3.com:50070
> 
> 
>   dfs.namenode.http-address.mycluster2.nn2
>   node4.com:50070
> 
>   dfs.client.failover.proxy.provider.ns-fed
>   
> org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
> 
> 
>   dfs.client.failover.random.order
>   true
>  {code}
>  
> Solution
> Add dfs.federation.router.ns.name configuration in hdfs-site.xml to mark the 
> Router NS name. and filter out Router NS during NameNode or ZKFC startup to 
> avoid this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To 

[jira] [Updated] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-01-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17362:

Status: Patch Available  (was: Open)

> RBF: RouterObserverReadProxyProvider should use 
> ConfiguredFailoverProxyProvider internally
> --
>
> Key: HDFS-17362
> URL: https://issues.apache.org/jira/browse/HDFS-17362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
> while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
> we are to align RouterObserverReadProxyProvider with 
> ObserverReadProxyProvider, RouterObserverReadProxyProvider should internally 
> use ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has 
> an issue with resolving HA configurations. (For example, 
> IPFailoverProxyProvider cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17362) RBF: RouterObserverReadProxyProvider should use ConfiguredFailoverProxyProvider internally

2024-01-28 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17362:
---

 Summary: RBF: RouterObserverReadProxyProvider should use 
ConfiguredFailoverProxyProvider internally
 Key: HDFS-17362
 URL: https://issues.apache.org/jira/browse/HDFS-17362
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


Currently, RouterObserverReadProxyProvider is using IPFailoverProxyProvider, 
while ObserverReadProxyProvider is using ConfiguredFailoverProxyProvider.  If 
we are to align RouterObserverReadProxyProvider with ObserverReadProxyProvider, 
RouterObserverReadProxyProvider should internally use 
ConfiguredFailoverProxyProvider.  Moreover, IPFailoverProxyProvider has an 
issue with resolving HA configurations. (For example, IPFailoverProxyProvider 
cannot resolve hdfs://router-service.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17343) Revert HDFS-16016. BPServiceActor to provide new thread to handle IBR

2024-01-20 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17343:

Fix Version/s: 3.5.0

> Revert HDFS-16016. BPServiceActor to provide new thread to handle IBR
> -
>
> Key: HDFS-17343
> URL: https://issues.apache.org/jira/browse/HDFS-17343
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.5.0
>
>
> When preparing for hadoop-3.4.0 release, we found that HDFS-16016 may cause 
> mis-order of ibr and fbr on datanode. After discussion, we decided to revert 
> HDFS-16016.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17312) packetsReceived metric should ignore heartbeat packet

2024-01-14 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17312:

Fix Version/s: 3.4.0
   (was: 3.4.1)
   (was: 3.5.0)

> packetsReceived metric should ignore heartbeat packet
> -
>
> Key: HDFS-17312
> URL: https://issues.apache.org/jira/browse/HDFS-17312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Metric packetsReceived should ignore heartbeat packet and only used to count 
> data packets and last packet in block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17312) packetsReceived metric should ignore heartbeat packet

2024-01-14 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806579#comment-17806579
 ] 

Takanobu Asanuma commented on HDFS-17312:
-

There seems to be another release candidate for 3.4.0, and I want to include 
this bug fix in 3.4.0 since this bug only occurs in 3.4.0. So, I'm backporting 
this commit to branch-3.4.0. ( CC: [~slfan1989] ) 

> packetsReceived metric should ignore heartbeat packet
> -
>
> Key: HDFS-17312
> URL: https://issues.apache.org/jira/browse/HDFS-17312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> Metric packetsReceived should ignore heartbeat packet and only used to count 
> data packets and last packet in block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17312) packetsReceived metric should ignore heartbeat packet

2024-01-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17312:

Fix Version/s: 3.4.1

> packetsReceived metric should ignore heartbeat packet
> -
>
> Key: HDFS-17312
> URL: https://issues.apache.org/jira/browse/HDFS-17312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> Metric packetsReceived should ignore heartbeat packet and only used to count 
> data packets and last packet in block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17312) packetsReceived metric should ignore heartbeat packet

2024-01-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17312.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> packetsReceived metric should ignore heartbeat packet
> -
>
> Key: HDFS-17312
> URL: https://issues.apache.org/jira/browse/HDFS-17312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Metric packetsReceived should ignore heartbeat packet and only used to count 
> data packets and last packet in block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17315) Optimize the namenode format code logic.

2024-01-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17315.
-
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17277) Delete invalid code logic in namenode format

2023-12-29 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17277:

Fix Version/s: 3.3.9

> Delete invalid code logic in namenode format
> 
>
> Key: HDFS-17277
> URL: https://issues.apache.org/jira/browse/HDFS-17277
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhangzhanchang
>Assignee: zhangzhanchang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> There is invalid logical processing in the namenode format process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17277) Delete invalid code logic in namenode format

2023-12-29 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reassigned HDFS-17277:
---

Assignee: zhangzhanchang

> Delete invalid code logic in namenode format
> 
>
> Key: HDFS-17277
> URL: https://issues.apache.org/jira/browse/HDFS-17277
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhangzhanchang
>Assignee: zhangzhanchang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There is invalid logical processing in the namenode format process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17277) Delete invalid code logic in namenode format

2023-12-29 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17277.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Delete invalid code logic in namenode format
> 
>
> Key: HDFS-17277
> URL: https://issues.apache.org/jira/browse/HDFS-17277
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhangzhanchang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There is invalid logical processing in the namenode format process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.

2023-12-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17301.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add read and write dataXceiver threads count metrics to datanode.
> -
>
> Key: HDFS-17301
> URL: https://issues.apache.org/jira/browse/HDFS-17301
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> # The DataNodeActiveXeiversCount metric contains the number of threads of all 
> Op types.
>  # In most cases, we focus more on the number of read and write dataXceiver 
> threads, so add read and write dataXceiver threads count metrics to datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17150) EC: Fix the bug of failed lease recovery.

2023-12-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17150:

Fix Version/s: 3.3.9

> EC: Fix the bug of failed lease recovery.
> -
>
> Key: HDFS-17150
> URL: https://issues.apache.org/jira/browse/HDFS-17150
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> If the client crashes without writing the minimum number of internal blocks 
> required by the EC policy, the lease recovery process for the corresponding 
> unclosed file may continue to fail. Taking RS(6,3) policy as an example, the 
> timeline is as follows:
> 1. The client writes some data to only 5 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the result of `uc.getNumExpectedLocations()` completely depends on 
> block report, and there are 5 datanodes reporting internal blocks;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. The datanode checks the command and finds that the number of internal 
> blocks is insufficient, resulting in an error and recovery failure;
> 7. The lease expires hard limit again, and NN issues a block recovery command 
> again, but the recovery fails again..
> When the number of internal blocks written by the client is less than 6, the 
> block group is actually unrecoverable. We should equate this situation to the 
> case where the number of replicas is 0 when processing replica files, i.e., 
> directly remove the last block group and close the file.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17150) EC: Fix the bug of failed lease recovery.

2023-12-28 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801009#comment-17801009
 ] 

Takanobu Asanuma commented on HDFS-17150:
-

Cherry-picked to branch-3.3.

> EC: Fix the bug of failed lease recovery.
> -
>
> Key: HDFS-17150
> URL: https://issues.apache.org/jira/browse/HDFS-17150
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> If the client crashes without writing the minimum number of internal blocks 
> required by the EC policy, the lease recovery process for the corresponding 
> unclosed file may continue to fail. Taking RS(6,3) policy as an example, the 
> timeline is as follows:
> 1. The client writes some data to only 5 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the result of `uc.getNumExpectedLocations()` completely depends on 
> block report, and there are 5 datanodes reporting internal blocks;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. The datanode checks the command and finds that the number of internal 
> blocks is insufficient, resulting in an error and recovery failure;
> 7. The lease expires hard limit again, and NN issues a block recovery command 
> again, but the recovery fails again..
> When the number of internal blocks written by the client is less than 6, the 
> block group is actually unrecoverable. We should equate this situation to the 
> case where the number of replicas is 0 when processing replica files, i.e., 
> directly remove the last block group and close the file.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17297:

Fix Version/s: 3.3.9

> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2023-12-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17297.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17284) Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery

2023-12-26 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17284.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks 
> during block recovery
> --
>
> Key: HDFS-17284
> URL: https://issues.apache.org/jira/browse/HDFS-17284
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks 
> during block recovery



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender

2023-12-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17298:

Fix Version/s: 3.3.9

> Fix NPE in DataNode.handleBadBlock and BlockSender
> --
>
> Key: HDFS-17298
> URL: https://issues.apache.org/jira/browse/HDFS-17298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> There are some NPE issues on the DataNode side of our online environment.
> The detailed exception information is
> {code:java}
> 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending 
> block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK 
> operation  src: /xxx:41452 dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> if (!fromScanner && blockScanner.isEnabled()) {
>   // data.getVolume(block) is null
>   blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(),
>   block);
> } 
> {code}
> {code:java}
> 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - 
> xxx:50010:DataXceiver error processing COPY_BLOCK operation  src: /xxx:61052 
> dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> // Obtain a reference before reading data
> volumeRef = datanode.data.getVolume(block).obtainReference(); 
> //datanode.data.getVolume(block) is null  
> {code}
> We need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender

2023-12-25 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17298.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix NPE in DataNode.handleBadBlock and BlockSender
> --
>
> Key: HDFS-17298
> URL: https://issues.apache.org/jira/browse/HDFS-17298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There are some NPE issues on the DataNode side of our online environment.
> The detailed exception information is
> {code:java}
> 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending 
> block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK 
> operation  src: /xxx:41452 dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> if (!fromScanner && blockScanner.isEnabled()) {
>   // data.getVolume(block) is null
>   blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(),
>   block);
> } 
> {code}
> {code:java}
> 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - 
> xxx:50010:DataXceiver error processing COPY_BLOCK operation  src: /xxx:61052 
> dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> // Obtain a reference before reading data
> volumeRef = datanode.data.getVolume(block).obtainReference(); 
> //datanode.data.getVolume(block) is null  
> {code}
> We need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-25 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800322#comment-17800322
 ] 

Takanobu Asanuma commented on HDFS-17299:
-

I also agree with the implementation of a bestEffort approach on the client 
side when creating a pipeline. Addressing this issue on the NameNode side would 
likely be difficult due to the complexity involved in managing rack status.

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> 

[jira] [Resolved] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17294.
-
Resolution: Fixed

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2023-12-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17294:

Fix Version/s: 3.4.0

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17042) Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode

2023-12-13 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17042:

Fix Version/s: 3.3.9

> Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode
> 
>
> Key: HDFS-17042
> URL: https://issues.apache.org/jira/browse/HDFS-17042
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> We'd like to add two new types of metrics to the existing NN 
> RpcMetrics/RpcDetailedMetrics. These two metrics can then be used as part of 
> SLA/SLO for the HDFS service.
>  * {_}RpcCallSuccesses{_}: it measures the number of RPC requests where they 
> are successfully processed by a NN (e.g., with a response with an RpcStatus 
> {_}RpcStatusProto.SUCCESS){_}{_}.{_} Then, together with {_}RpcQueueNumOps 
> ({_}which refers the total number of RPC requests{_}){_}, we can derive the 
> RpcErrorRate for our NN, as (RpcQueueNumOps - RpcCallSuccesses) / 
> RpcQueueNumOps. 
>  * OverallRpcProcessingTime for each RPC method: this metric measures the 
> overall RPC processing time for each RPC method at the NN. It covers the time 
> from when a request arrives at the NN to when a response is sent back. We are 
> already emitting processingTime for each RPC method today in 
> RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for 
> each RPC method, which includes enqueueTime, queueTime, processingTime, 
> responseTime, and handlerTime.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17156:

Fix Version/s: 3.3.9
   (was: 3.3.7)

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17030:

Fix Version/s: 3.3.9

> Limit wait time for getHAServiceState in ObserverReaderProxy
> 
>
> Key: HDFS-17030
> URL: https://issues.apache.org/jira/browse/HDFS-17030
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17156:

Fix Version/s: 3.3.7

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17156.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17156:

Summary: Client may receive old state ID which will lead to inconsistent 
reads  (was: mapreduce job encounters java.io.IOException)

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true

2023-08-14 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754126#comment-17754126
 ] 

Takanobu Asanuma commented on HDFS-17156:
-

[~chunyiyang] I added you to the contributor role and assigned you to this 
jira. You can assign yourself next time. Thanks!

> mapreduce job encounters java.io.IOException when 
> dfs.client.rbf.observer.read.enable is true
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17156) mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true

2023-08-14 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reassigned HDFS-17156:
---

Assignee: Chunyi Yang

> mapreduce job encounters java.io.IOException when 
> dfs.client.rbf.observer.read.enable is true
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true

2023-08-11 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753235#comment-17753235
 ] 

Takanobu Asanuma commented on HDFS-17156:
-

It looked like a bug in RBF SBN, so I moved this jira from HADOOP to HDFS.

> mapreduce job encounters java.io.IOException when 
> dfs.client.rbf.observer.read.enable is true
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-17156) mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true

2023-08-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma moved HADOOP-18847 to HDFS-17156:
--

Component/s: rbf
 (was: common)
Key: HDFS-17156  (was: HADOOP-18847)
Project: Hadoop HDFS  (was: Hadoop Common)

> mapreduce job encounters java.io.IOException when 
> dfs.client.rbf.observer.read.enable is true
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16967) RBF: File based state stores should allow concurrent access to the records

2023-04-04 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16967.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> RBF: File based state stores should allow concurrent access to the records
> --
>
> Key: HDFS-16967
> URL: https://issues.apache.org/jira/browse/HDFS-16967
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> File based state store implementations (StateStoreFileImpl and 
> StateStoreFileSystemImpl) should allow updating as well as reading of the 
> state store records concurrently rather than serially. Concurrent access to 
> the record files on the hdfs based store seems to be improving the state 
> store cache loading performance by more than 10x.
> For instance, in order to maintain data integrity, when any mount table 
> record(s) is updated, the cache is reloaded. This reload operation seems to 
> be able to gain significant performance improvement by the concurrent access 
> of the mount table records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16958) EC: Fix bug in processing EC excess redundancy

2023-03-27 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16958.
-
Resolution: Not A Problem

> EC: Fix bug in processing EC excess redundancy 
> ---
>
> Key: HDFS-16958
> URL: https://issues.apache.org/jira/browse/HDFS-16958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When processing excess redundancy, the number of internal blocks is computed 
> by traversing `nonExcess`. This way is not accurate, because `nonExcess` 
> excludes replicas in abnormal states, such as corrupt ones, or maintenance 
> ones. `numOfTarget` may be smaller than the actual value, which will result 
> in inaccurate generated `excessTypes`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16310) RBF: Add client port to CallerContext for Router

2023-02-23 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692558#comment-17692558
 ] 

Takanobu Asanuma commented on HDFS-16310:
-

Hi [~omalley], I have found several JIRAs that you have cherry-picked to 
branch-3.3 but not updated the fix versions. Please take care to set the 
correct fix versions. It would affect the hadoop changelog.

> RBF: Add client port to CallerContext for Router
> 
>
> Key: HDFS-16310
> URL: https://issues.apache.org/jira/browse/HDFS-16310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We mentioned in [HDFS-16266|https://issues.apache.org/jira/browse/HDFS-16266] 
> that adding the client port to the CallerContext of the Router.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16310) RBF: Add client port to CallerContext for Router

2023-02-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16310:

Fix Version/s: 3.3.5

> RBF: Add client port to CallerContext for Router
> 
>
> Key: HDFS-16310
> URL: https://issues.apache.org/jira/browse/HDFS-16310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We mentioned in [HDFS-16266|https://issues.apache.org/jira/browse/HDFS-16266] 
> that adding the client port to the CallerContext of the Router.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16310) RBF: Add client port to CallerContext for Router

2023-02-23 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692555#comment-17692555
 ] 

Takanobu Asanuma commented on HDFS-16310:
-

I added 3.3.5 to fix versions.

> RBF: Add client port to CallerContext for Router
> 
>
> Key: HDFS-16310
> URL: https://issues.apache.org/jira/browse/HDFS-16310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We mentioned in [HDFS-16266|https://issues.apache.org/jira/browse/HDFS-16266] 
> that adding the client port to the CallerContext of the Router.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16266) Add remote port information to HDFS audit log

2023-02-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16266:

Fix Version/s: 3.3.5

> Add remote port information to HDFS audit log
> -
>
> Key: HDFS-16266
> URL: https://issues.apache.org/jira/browse/HDFS-16266
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> In our production environment, we occasionally encounter a problem where a 
> user submits an abnormal computation task, causing a sudden flood of 
> requests, which causes the queueTime and processingTime of the Namenode to 
> rise very high, causing a large backlog of tasks.
> We usually locate and kill specific Spark, Flink, or MapReduce tasks based on 
> metrics and audit logs. Currently, IP and UGI are recorded in audit logs, but 
> there is no port information, so it is difficult to locate specific processes 
> sometimes. Therefore, I propose that we add the port information to the audit 
> log, so that we can easily track the upstream process.
> Currently, some projects contain port information in audit logs, such as 
> Hbase and Alluxio. I think it is also necessary to add port information for 
> HDFS audit logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16266) Add remote port information to HDFS audit log

2023-02-23 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692553#comment-17692553
 ] 

Takanobu Asanuma commented on HDFS-16266:
-

I added 3.3.5 to fix versions.

> Add remote port information to HDFS audit log
> -
>
> Key: HDFS-16266
> URL: https://issues.apache.org/jira/browse/HDFS-16266
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> In our production environment, we occasionally encounter a problem where a 
> user submits an abnormal computation task, causing a sudden flood of 
> requests, which causes the queueTime and processingTime of the Namenode to 
> rise very high, causing a large backlog of tasks.
> We usually locate and kill specific Spark, Flink, or MapReduce tasks based on 
> metrics and audit logs. Currently, IP and UGI are recorded in audit logs, but 
> there is no port information, so it is difficult to locate specific processes 
> sometimes. Therefore, I propose that we add the port information to the audit 
> log, so that we can easily track the upstream process.
> Currently, some projects contain port information in audit logs, such as 
> Hbase and Alluxio. I think it is also necessary to add port information for 
> HDFS audit logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15630) RBF: Fix wrong client IP info in CallerContext when requests mount points with multi-destinations.

2023-02-22 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692552#comment-17692552
 ] 

Takanobu Asanuma commented on HDFS-15630:
-

I added 3.3.5 to fix versions.

> RBF: Fix wrong client IP info in CallerContext when requests mount points 
> with multi-destinations.
> --
>
> Key: HDFS-15630
> URL: https://issues.apache.org/jira/browse/HDFS-15630
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-15630.001.patch, HDFS-15630.002.patch, 
> HDFS-15630.003.patch, HDFS-15630.004.patch, HDFS-15630.005.patch, 
> HDFS-15630.006.patch, HDFS-15630.test.patch
>
>
> There are two issues about client IP info in CallerContext when we try to 
> request mount points with multi-destinations.
>  # the clientIp would duplicate in CallerContext when 
> RouterRpcClient#invokeSequential.
>  # the clientIp would miss in CallerContext when 
> RouterRpcClient#invokeConcurrent. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15630) RBF: Fix wrong client IP info in CallerContext when requests mount points with multi-destinations.

2023-02-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15630:

Fix Version/s: 3.3.5

> RBF: Fix wrong client IP info in CallerContext when requests mount points 
> with multi-destinations.
> --
>
> Key: HDFS-15630
> URL: https://issues.apache.org/jira/browse/HDFS-15630
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-15630.001.patch, HDFS-15630.002.patch, 
> HDFS-15630.003.patch, HDFS-15630.004.patch, HDFS-15630.005.patch, 
> HDFS-15630.006.patch, HDFS-15630.test.patch
>
>
> There are two issues about client IP info in CallerContext when we try to 
> request mount points with multi-destinations.
>  # the clientIp would duplicate in CallerContext when 
> RouterRpcClient#invokeSequential.
>  # the clientIp would miss in CallerContext when 
> RouterRpcClient#invokeConcurrent. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13293) RBF: The RouterRPCServer should transfer client IP via CallerContext to NamenodeRpcServer

2023-02-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-13293:

Fix Version/s: 3.3.5

> RBF: The RouterRPCServer should transfer client IP via CallerContext to 
> NamenodeRpcServer
> -
>
> Key: HDFS-13293
> URL: https://issues.apache.org/jira/browse/HDFS-13293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Baolong Mao
>Assignee: Hui Fei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13293.001.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Otherwise, the namenode don't know the client's callerContext
> This jira focuses on audit log which logs real client ip. Leave locality to 
> HDFS-13248



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13293) RBF: The RouterRPCServer should transfer client IP via CallerContext to NamenodeRpcServer

2023-02-22 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692551#comment-17692551
 ] 

Takanobu Asanuma commented on HDFS-13293:
-

I added 3.3.5 to fix versions.

> RBF: The RouterRPCServer should transfer client IP via CallerContext to 
> NamenodeRpcServer
> -
>
> Key: HDFS-13293
> URL: https://issues.apache.org/jira/browse/HDFS-13293
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Baolong Mao
>Assignee: Hui Fei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13293.001.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Otherwise, the namenode don't know the client's callerContext
> This jira focuses on audit log which logs real client ip. Leave locality to 
> HDFS-13248



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16845) Add configuration flag to enable observer reads on routers without using ObserverReadProxyProvider

2023-02-22 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692547#comment-17692547
 ] 

Takanobu Asanuma commented on HDFS-16845:
-

It is only in trunk. I corrected the fixed versions.

> Add configuration flag to enable observer reads on routers without using 
> ObserverReadProxyProvider
> --
>
> Key: HDFS-16845
> URL: https://issues.apache.org/jira/browse/HDFS-16845
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In order for clients to have routers forward their reads to observers, the 
> clients must use a proxy with an alignment context. This is currently 
> achieved by using the ObserverReadProxyProvider.
> Using ObserverReadProxyProvider allows backward compatible for client 
> configurations.
> However, the ObserverReadProxyProvider forces an msync on initialization 
> which is not required with routers.
> Performing msync calls is more expensive with routers because the router fans 
> out the cal to all namespaces, so we'd like to avoid this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16845) Add configuration flag to enable observer reads on routers without using ObserverReadProxyProvider

2023-02-22 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16845:

Fix Version/s: (was: 3.3.5)
   (was: 2.10.3)

> Add configuration flag to enable observer reads on routers without using 
> ObserverReadProxyProvider
> --
>
> Key: HDFS-16845
> URL: https://issues.apache.org/jira/browse/HDFS-16845
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In order for clients to have routers forward their reads to observers, the 
> clients must use a proxy with an alignment context. This is currently 
> achieved by using the ObserverReadProxyProvider.
> Using ObserverReadProxyProvider allows backward compatible for client 
> configurations.
> However, the ObserverReadProxyProvider forces an msync on initialization 
> which is not required with routers.
> Performing msync calls is more expensive with routers because the router fans 
> out the cal to all namespaces, so we'd like to avoid this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2023-02-21 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691455#comment-17691455
 ] 

Takanobu Asanuma commented on HDFS-13522:
-

[~simbadzina] Thanks for your comment.

For clientC(2.7.x), in my test case, it isn't small changes because it needs 
the latest changes and HDFS-12943. The changes may be small for clientB(3.3.4), 
but we still have to apply the latest patches to all these clients who want to 
send read requests to observers behind routers. Upgrading all clients is 
painful in most environments.

In this jira, there are Design A and Design B. Design A is effective because it 
saves msycn calls but needs to upgrade the client side since it extends the RPC 
header. Design B is that the router always calls msync for each read, which is 
expensive, but old clients can send read requests to the observer. So, the 
combination of Design A (for new clients) and Design B (for old clients) was 
chosen. Am I right?

I'm not sure why Design B is finally not implemented in the current 
implementation (is there any discussion about it?). I think most users still 
need Design B, as others mentioned in this jira. Is it possible to implement an 
option(configuration) on the Router side to switch the behavior between Design 
A and Design B for old clients?

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2023-02-20 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691247#comment-17691247
 ] 

Takanobu Asanuma commented on HDFS-13522:
-

[~simbadzina] Thanks again for implementing this feature.
I deployed RBF SBN with the latest trunk in my local environment and submitted 
READ requests using several clients.
 - Router and NameNodes: the latest trunk (including HDFS-13522 and HDFS-16767 
and other bug fixes)
 - clientA: the latest trunk (use ObserverReadProxyProvider)
 - clientB: 3.3.4 (use ObserverReadProxyProvider)
 - clientC: 2.7.x (which doesn't have ObserverReadProxyProvider)

As a result of that, only clientA was able to read from Observer. Router always 
forwarded read requests from clientB and clientC to Acitve.

I looked into the design docs and discussions in this jira, and if I understand 
correctly, the design intends that not only new clients but also old clients 
can submit read requests to Observer. But it doesn't seem old clients can do it 
now.
Am I missing some configuration to do that? Or is it still a work in progress?

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet

2023-02-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16903.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix javadoc of Class LightWeightResizableGSet
> -
>
> Key: HDFS-16903
> URL: https://issues.apache.org/jira/browse/HDFS-16903
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Assignee: ZhangHB
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> After HDFS-16429 (Add DataSetLockManager to manage fine-grain locks for 
> FsDataSetImpl), the Class LightWeightResizableGSet is thread-safe. So we 
> should fix the docs of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet

2023-02-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reassigned HDFS-16903:
---

Assignee: ZhangHB

> Fix javadoc of Class LightWeightResizableGSet
> -
>
> Key: HDFS-16903
> URL: https://issues.apache.org/jira/browse/HDFS-16903
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Assignee: ZhangHB
>Priority: Trivial
>  Labels: pull-request-available
>
> After HDFS-16429 (Add DataSetLockManager to manage fine-grain locks for 
> FsDataSetImpl), the Class LightWeightResizableGSet is thread-safe. So we 
> should fix the docs of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet

2023-02-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16903:

Description: After HDFS-16429 (Add DataSetLockManager to manage fine-grain 
locks for FsDataSetImpl), the Class LightWeightResizableGSet is thread-safe. So 
we should fix the docs of it.  (was: After [HDFS-16249. Add DataSetLockManager 
to manage fine-grain locks for FsDataSetImpl.], the Class 
LightWeightResizableGSet is thread-safe. So we should fix the docs of it.)

> Fix javadoc of Class LightWeightResizableGSet
> -
>
> Key: HDFS-16903
> URL: https://issues.apache.org/jira/browse/HDFS-16903
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Trivial
>  Labels: pull-request-available
>
> After HDFS-16429 (Add DataSetLockManager to manage fine-grain locks for 
> FsDataSetImpl), the Class LightWeightResizableGSet is thread-safe. So we 
> should fix the docs of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16821.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix regression in HDFS-13522 that enables observer reads by default.
> 
>
> Key: HDFS-16821
> URL: https://issues.apache.org/jira/browse/HDFS-16821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Serving reads consistently from Observer Namenodes is a feature that was 
> introduced in HDFS-12943.
> Clients opt-into this feature by configuring the ObserverReadProxyProvider. 
> It is important that the opt-in is explicit because for third-party reads to 
> remain consistent, these clients then need to perform an msync before reads.
> In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus 
> enabling Observer reads for all clients by default. This breaks consistency 
> guarantees for clients that haven't opted into observer reads.
> [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354]
> We need to return to the old behavior of only using the ClientGSIContext when 
> users have explicitly opted into Observer reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16888) BlockManager#maxReplicationStreams, replicationStreamsHardLimit, blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be volatile

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16888:

Fix Version/s: 3.3.9

> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be 
> volatile
> 
>
> Key: HDFS-16888
> URL: https://issues.apache.org/jira/browse/HDFS-16888
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout these 
> variables may be  writen by NameNode#reconfReplicationParameters then while 
> read by the other threads. 
> Thus they should be declared as volatile to make sure the "happens-before" 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16888) BlockManager#maxReplicationStreams, replicationStreamsHardLimit, blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be volatile

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16888.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be 
> volatile
> 
>
> Key: HDFS-16888
> URL: https://issues.apache.org/jira/browse/HDFS-16888
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout these 
> variables may be  writen by NameNode#reconfReplicationParameters then while 
> read by the other threads. 
> Thus they should be declared as volatile to make sure the "happens-before" 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16889) Backport JIRAs related to RBF SBN to branch-3.3

2023-01-24 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680423#comment-17680423
 ] 

Takanobu Asanuma commented on HDFS-16889:
-

It will be from trunk. When we plan to release 3.4.0, we cut branch-3.4 from 
trunk. Then trunk will target 3.5.

> Backport JIRAs related to RBF SBN to branch-3.3
> ---
>
> Key: HDFS-16889
> URL: https://issues.apache.org/jira/browse/HDFS-16889
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> This is an umbrella JIRA to backport RBF SBN to branch-3.3. There are some 
> conflicts when trying to backport HDFS-13522 and HDFS-16767, the main 
> implementations of RBF SBN. Currently, to solve the conflicts, we need to 
> backport the following JIRAs sequentially. (Thanks [~simbadzina] for the 
> information.)
>  # HDFS-14090
>  # HDFS-15417
>  # HDFS-16296
>  # HDFS-16302
>  # HDFS-15757
>  # HDFS-13274
>  # Then HDFS-13522
>  # HDFS-16065
>  # HDFS-16313
>  # HDFS-16273
>  # Then HDFS-16767 + other bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16886) Fix documentation for StateStoreRecordOperations#get(Class ..., Query ...)

2023-01-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16886:

Fix Version/s: 3.3.9
   (was: 3.3.5)

> Fix documentation for StateStoreRecordOperations#get(Class ..., Query ...)
> --
>
> Key: HDFS-16886
> URL: https://issues.apache.org/jira/browse/HDFS-16886
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> For {*}StateStoreRecordOperations#get(Class ..., Query ...){*}, when multiple 
> records match, the documentation says a null value should be returned and an 
> IOException should be thrown. Both can't happen.
> I believe the intended behavior is that an IOException is thrown. This is the 
> implementation in {*}StateStoreBaseImpl{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16876) Garbage collect map entries in shared RouterStateIdContext using information from namenodeResolver instead of the map of active connectionPools.

2023-01-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16876.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Garbage collect map entries in shared RouterStateIdContext using information 
> from namenodeResolver instead of the map of active connectionPools.
> 
>
> Key: HDFS-16876
> URL: https://issues.apache.org/jira/browse/HDFS-16876
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> An element in RouterStateIdContext#namespaceIdMap is deleted when there is no 
> connectionPool referencing the namespace. This is done by a thread in 
> ConnectionManager that cleans up stale connectionPools. I propose a less 
> aggressive approach, that is, cleaning up an entry when the router cannot 
> resolve a namenode belonging to the namespace.
> Some benefits of this approach are:
>  * Even when there are no active connections, the router still tracks a 
> recent state of the namenode. This will be beneficial for debugging.
>  * Simpler lifecycle for the map entries. The entries are long-lived.
>  * Few operations under the writeLock in ConnectionManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16889) Backport JIRAs related to RBF SBN to branch-3.3

2023-01-20 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679157#comment-17679157
 ] 

Takanobu Asanuma commented on HDFS-16889:
-

After looking deeper into the related JIRAs, and thinking again, it doesn't 
seem easy to backport these patches to branch-3.3. Since branch-3.3 has already 
been released five times and is mature, significant backporting changes like 
HDFS-14090, HDFS-13522 and HDFS-16767 would have too much impact, even though 
they keep backward compatibility.

> Backport JIRAs related to RBF SBN to branch-3.3
> ---
>
> Key: HDFS-16889
> URL: https://issues.apache.org/jira/browse/HDFS-16889
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> This is an umbrella JIRA to backport RBF SBN to branch-3.3. There are some 
> conflicts when trying to backport HDFS-13522 and HDFS-16767, the main 
> implementations of RBF SBN. Currently, to solve the conflicts, we need to 
> backport the following JIRAs sequentially. (Thanks [~simbadzina] for the 
> information.)
>  # HDFS-14090
>  # HDFS-15417
>  # HDFS-16296
>  # HDFS-16302
>  # HDFS-15757
>  # HDFS-13274
>  # Then HDFS-13522
>  # HDFS-16065
>  # HDFS-16313
>  # HDFS-16273
>  # Then HDFS-16767 + other bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16889) Backport JIRAs related to RBF SBN to branch-3.3

2023-01-19 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678606#comment-17678606
 ] 

Takanobu Asanuma commented on HDFS-16889:
-

Thanks for your comment, [~simbadzina].
 - I think it would be better to create a backport PR under each existing JIRA 
if there are not many conflicts.
 - HDFS-14090 is a relatively large change. I want to get a consensus on 
whether we can backport to branch-3.3 in the JIRA.
 - Yes, I will do that. I can also review them if you do it.

> Backport JIRAs related to RBF SBN to branch-3.3
> ---
>
> Key: HDFS-16889
> URL: https://issues.apache.org/jira/browse/HDFS-16889
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> This is an umbrella JIRA to backport RBF SBN to branch-3.3. There are some 
> conflicts when trying to backport HDFS-13522 and HDFS-16767, the main 
> implementations of RBF SBN. Currently, to solve the conflicts, we need to 
> backport the following JIRAs sequentially. (Thanks [~simbadzina] for the 
> information.)
>  # HDFS-14090
>  # HDFS-15417
>  # HDFS-16296
>  # HDFS-16302
>  # HDFS-15757
>  # HDFS-13274
>  # Then HDFS-13522
>  # HDFS-16065
>  # HDFS-16313
>  # HDFS-16273
>  # Then HDFS-16767 + other bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2023-01-12 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676240#comment-17676240
 ] 

Takanobu Asanuma commented on HDFS-13522:
-

Created the umbrella Jira by HDFS-16889.

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16889) Backport JIRAs related to RBF SBN to branch-3.3

2023-01-12 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-16889:
---

 Summary: Backport JIRAs related to RBF SBN to branch-3.3
 Key: HDFS-16889
 URL: https://issues.apache.org/jira/browse/HDFS-16889
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


This is an umbrella JIRA to backport RBF SBN to branch-3.3. There are some 
conflicts when trying to backport HDFS-13522 and HDFS-16767, the main 
implementations of RBF SBN. Currently, to solve the conflicts, we need to 
backport the following JIRAs sequentially. (Thanks [~simbadzina] for the 
information.)
 # HDFS-14090
 # HDFS-15417
 # HDFS-16296
 # HDFS-16302
 # HDFS-15757
 # HDFS-13274
 # Then HDFS-13522
 # HDFS-16065
 # HDFS-16313
 # HDFS-16273
 # Then HDFS-16767 + other bug fixes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2023-01-06 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655438#comment-17655438
 ] 

Takanobu Asanuma commented on HDFS-13522:
-

Thanks for sharing it, [~simbadzina]. I looked over them, and it seems each 
Jira is important, yet it keeps the backward compatibility. So we may be able 
to backport them to branch-3.3. I'd like to create the parent Jira next week if 
there is no objection.

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16767) RBF: Support observer node from Router-Based Federation

2023-01-04 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16767:

Fix Version/s: (was: 3.3.5)

> RBF: Support observer node from Router-Based Federation 
> 
>
> Key: HDFS-16767
> URL: https://issues.apache.org/jira/browse/HDFS-16767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Enable routers to direct read calls to observer namenodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2023-01-04 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-13522:

Fix Version/s: (was: 3.3.5)

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2023-01-04 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654706#comment-17654706
 ] 

Takanobu Asanuma commented on HDFS-13522:
-

Thanks for your reply, [~simbadzina]. If I understand correctly, the fixed 
versions should be updated when the version (branch) has included the feature. 
So I'll remove 3.3.5 from the fixed versions for now. I hope branch-3.3 include 
this great feature. Maybe I can help with it.

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2022-12-27 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652362#comment-17652362
 ] 

Takanobu Asanuma commented on HDFS-13522:
-

Hi, [~simbadzina] [~omalley]. The fix versions of HDFS-13522 and HDFS-16767 are 
3.3.5, but there seem not to be any commits about this feature in branch-3.3. 
Am I missing something?

> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16809) EC striped block is not sufficient when doing in maintenance

2022-12-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16809:

Fix Version/s: 3.2.5

> EC striped block is not sufficient when doing in maintenance
> 
>
> Key: HDFS-16809
> URL: https://issues.apache.org/jira/browse/HDFS-16809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Reporter: dingshun
>Assignee: dingshun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> When doing maintenance, ec striped block is not sufficient, which will lead 
> to miss block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16809) EC striped block is not sufficient when doing in maintenance

2022-12-05 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16809.
-
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> EC striped block is not sufficient when doing in maintenance
> 
>
> Key: HDFS-16809
> URL: https://issues.apache.org/jira/browse/HDFS-16809
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Reporter: dingshun
>Assignee: dingshun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> When doing maintenance, ec striped block is not sufficient, which will lead 
> to miss block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16846) EC: Only EC blocks should be effected by max-streams-hard-limit configuration

2022-11-28 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16846:

Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> EC: Only EC blocks should be effected by max-streams-hard-limit configuration
> -
>
> Key: HDFS-16846
> URL: https://issues.apache.org/jira/browse/HDFS-16846
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In [HDFS-16613|https://issues.apache.org/jira/browse/HDFS-16613], the 
> dfs.namenode.replication.max-streams-hard-limit configuration will only 
> affect decommissioning DataNode, but will not distinguish between replication 
> blocks and EC blocks. Even if DataNodes have only replication files, they 
> will always generate high network traffic. So this configuration should only 
> effect EC blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-11-16 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17635068#comment-17635068
 ] 

Takanobu Asanuma commented on HDFS-16613:
-

Thanks for your reply and the suggestion, [~caozhiqiang].

Given that the speed is not much improved in the case of replication, I prefer 
that this setting should only affect ec blocks. Could you please create another 
issue addressing it?

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >