date:20240126

[jira] [Commented] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up

2024-01-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811425#comment-17811425
 ] 

ASF GitHub Bot commented on HDFS-17354:
---

simbadzina commented on PR #6498:
URL: https://github.com/apache/hadoop/pull/6498#issuecomment-1912849626

   Changes generally looks okay to me. Is this just an optimization to avoid 
clearing a map which is empty, or there can be an error if we clear before the 
router is in the RUNNING state.
   
   Can you please add a test case.




> Delay invoke  clearStaleNamespacesInRouterStateIdContext during router start 
> up
> ---
>
> Key: HDFS-17354
> URL: https://issues.apache.org/jira/browse/HDFS-17354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
>
> We should  start clear expired namespace thread at  RouterRpcServer RUNNING 
> phase  because StateStoreService is Initialized in  initialization phase.  
> Now, router will throw IoException when start up.
> {panel:title=Exception}
> 2024-01-09 16:27:06,939 WARN 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not 
> fetch current list of namespaces.
> java.io.IOException: State Store does not have an interface for 
> MembershipStore
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {panel}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17325:
--
Affects Version/s: 3.4.0

> Doc: Fix the documentation of fs expunge command in FileSystemShell.md
> --
>
> Key: HDFS-17325
> URL: https://issues.apache.org/jira/browse/HDFS-17325
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix doc in FileSystemShell.md.
> hadoop fs -expunge --immediate   should be hadoop fs -expunge -immediate
>  
> Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17309) RBF: Fix Router Safemode check contidition error

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17309:
--
Component/s: rbf

> RBF: Fix Router Safemode check contidition error
> 
>
> Key: HDFS-17309
> URL: https://issues.apache.org/jira/browse/HDFS-17309
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With HDFS-17116, Router safemode check contidition use monotonicNow(). 
> For code in  RouterSafemodeService.periodicInvoke()
> long now = monotonicNow();
> long cacheUpdateTime = stateStore.getCacheUpdateTime();
> boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval;
>  
> Function monotonicNow() is implemented with System.nanoTime(). 
> System.nanoTime() in javadoc description:
> This method can only be used to measure elapsed time and is not related to 
> any other notion of system or wall-clock time. The value returned represents 
> nanoseconds since some fixed but arbitrary origin time (perhaps in the 
> future, so values may be negative). 
>  
> The following situation maybe exists ：
> If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , 
> and now - cacheUpdateTime is arbitrary origin time，so isCacheStale maybe  be 
> true or false. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17306:
--
Component/s: rdf
 router

> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rdf, router
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17309) RBF: Fix Router Safemode check contidition error

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17309:
--
Affects Version/s: 3.4.0

> RBF: Fix Router Safemode check contidition error
> 
>
> Key: HDFS-17309
> URL: https://issues.apache.org/jira/browse/HDFS-17309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With HDFS-17116, Router safemode check contidition use monotonicNow(). 
> For code in  RouterSafemodeService.periodicInvoke()
> long now = monotonicNow();
> long cacheUpdateTime = stateStore.getCacheUpdateTime();
> boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval;
>  
> Function monotonicNow() is implemented with System.nanoTime(). 
> System.nanoTime() in javadoc description:
> This method can only be used to measure elapsed time and is not related to 
> any other notion of system or wall-clock time. The value returned represents 
> nanoseconds since some fixed but arbitrary origin time (perhaps in the 
> future, so values may be negative). 
>  
> The following situation maybe exists ：
> If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , 
> and now - cacheUpdateTime is arbitrary origin time，so isCacheStale maybe  be 
> true or false. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17310:
--
Component/s: datanode

> DiskBalancer: Enhance the log message for submitPlan
> 
>
> Key: HDFS-17310
> URL: https://issues.apache.org/jira/browse/HDFS-17310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In order to convenient troubleshoot problems, enhance the log message for 
> submitPlan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17310:
--
Affects Version/s: 3.4.0

> DiskBalancer: Enhance the log message for submitPlan
> 
>
> Key: HDFS-17310
> URL: https://issues.apache.org/jira/browse/HDFS-17310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In order to convenient troubleshoot problems, enhance the log message for 
> submitPlan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17306:
--
Affects Version/s: 3.4.0

> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17312) packetsReceived metric should ignore heartbeat packet

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17312:
--
Hadoop Flags: Reviewed

> packetsReceived metric should ignore heartbeat packet
> -
>
> Key: HDFS-17312
> URL: https://issues.apache.org/jira/browse/HDFS-17312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Metric packetsReceived should ignore heartbeat packet and only used to count 
> data packets and last packet in block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17325:
--
Component/s: documentation
 fs

> Doc: Fix the documentation of fs expunge command in FileSystemShell.md
> --
>
> Key: HDFS-17325
> URL: https://issues.apache.org/jira/browse/HDFS-17325
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix doc in FileSystemShell.md.
> hadoop fs -expunge --immediate   should be hadoop fs -expunge -immediate
>  
> Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16716) Improve appendToFile command: support appending on file with new block

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16716:
--
Component/s: fs

> Improve appendToFile command: support appending on file with new block
> --
>
> Key: HDFS-16716
> URL: https://issues.apache.org/jira/browse/HDFS-16716
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.4.0, 3.3.6
>Reporter: guojunhao
>Assignee: M1eyu2018
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HDFS client DistributedFileSystem#append supports appending to a file with 
> optional create flags.
> However, appendToFile command only supports the default create flag APPEND so 
> that append on EC file without NEW_BLOCK create flag is not supported.
> Thus, it's necessary to improve appendToFile command by adding option n for 
> it. Option n represents that use NEW_BLOCK create flag while appending file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16652:
--
Component/s: ui

> Upgrade jquery datatable version references to v1.10.19
> ---
>
> Key: HDFS-16652
> URL: https://issues.apache.org/jira/browse/HDFS-16652
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 3.4.0
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-16652.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Upgrade jquery datatable version references in hdfs webapp to v1.10.19



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16422:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.3, 3.4.0  (was: 3.4.0, 3.3.3)

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16384) Upgrade Netty to 4.1.72.Final

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16384:
--
Fix Version/s: (was: 3.4.0)

> Upgrade Netty to 4.1.72.Final
> -
>
> Key: HDFS-16384
> URL: https://issues.apache.org/jira/browse/HDFS-16384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.1
>Reporter: Tamas Penzes
>Assignee: Tamas Penzes
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> New fixes for netty, nothing else changed, just netty version bumped and two 
> more exclusion in hdfs-client because of new netty.
> No new tests added as not needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16252:
--
Hadoop Flags: Reviewed

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16227) testMoverWithStripedFile fails intermittently

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16227:
--
Hadoop Flags: Reviewed

> testMoverWithStripedFile fails intermittently
> -
>
> Key: HDFS-16227
> URL: https://issues.apache.org/jira/browse/HDFS-16227
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> TestMover#testMoverWithStripedFile fails intermittently with stacktrace:
> {code:java}
> [ERROR] 
> testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover)  Time 
> elapsed: 48.439 s  <<< FAILURE![ERROR] 
> testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover)  Time 
> elapsed: 48.439 s  <<< FAILURE!java.lang.AssertionError: expected: 
> but was: at org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:120) at 
> org.junit.Assert.assertEquals(Assert.java:146) at 
> org.apache.hadoop.hdfs.server.mover.TestMover.testMoverWithStripedFile(TestMover.java:965)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> e.g 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3386/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16227) testMoverWithStripedFile fails intermittently

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16227:
--
Affects Version/s: 3.4.0

> testMoverWithStripedFile fails intermittently
> -
>
> Key: HDFS-16227
> URL: https://issues.apache.org/jira/browse/HDFS-16227
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> TestMover#testMoverWithStripedFile fails intermittently with stacktrace:
> {code:java}
> [ERROR] 
> testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover)  Time 
> elapsed: 48.439 s  <<< FAILURE![ERROR] 
> testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover)  Time 
> elapsed: 48.439 s  <<< FAILURE!java.lang.AssertionError: expected: 
> but was: at org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:120) at 
> org.junit.Assert.assertEquals(Assert.java:146) at 
> org.apache.hadoop.hdfs.server.mover.TestMover.testMoverWithStripedFile(TestMover.java:965)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> e.g 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3386/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16080:
--
Component/s: rbf

> RBF: Invoking method in all locations should break the loop after successful 
> result
> ---
>
> Key: HDFS-16080
> URL: https://issues.apache.org/jira/browse/HDFS-16080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> rename, delete and mkdir used by Router client usually calls multiple 
> locations if the path is present in multiple sub-clusters. After invoking 
> multiple concurrent proxy calls to multiple clients, we iterate through all 
> results and mark anyResult true if at least one of them was successful. We 
> should break the loop if one of the proxy call result was successful rather 
> than iterating over remaining calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16075:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Use empty array constants present in StorageType and DatanodeInfo to avoid 
> creating redundant objects
> -
>
> Key: HDFS-16075
> URL: https://issues.apache.org/jira/browse/HDFS-16075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> StorageType and DatanodeInfo already provides empty array constants. We 
> should use them where possible in order to avoid creating unnecessary new 
> empty array objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16050) Some dynamometer tests fail

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16050:
--
Affects Version/s: 3.3.2
   3.4.0

> Some dynamometer tests fail
> ---
>
> Key: HDFS-16050
> URL: https://issues.apache.org/jira/browse/HDFS-16050
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following tests failed:
> {quote}hadoop.tools.dynamometer.TestDynamometerInfra
>  hadoop.tools.dynamometer.blockgenerator.TestBlockGen
> hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt]
> {quote}[ERROR] 
> testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator)
>  Time elapsed: 1.353 s <<< ERROR!
>  java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16050) Some dynamometer tests fail

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16050:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Some dynamometer tests fail
> ---
>
> Key: HDFS-16050
> URL: https://issues.apache.org/jira/browse/HDFS-16050
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following tests failed:
> {quote}hadoop.tools.dynamometer.TestDynamometerInfra
>  hadoop.tools.dynamometer.blockgenerator.TestBlockGen
> hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt]
> {quote}[ERROR] 
> testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator)
>  Time elapsed: 1.353 s <<< ERROR!
>  java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16046:
--
Affects Version/s: 3.4.0

> TestBalanceProcedureScheduler and TestDistCpProcedure timeout
> -
>
> Key: HDFS-16046
> URL: https://issues.apache.org/jira/browse/HDFS-16046
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, 
> screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following two tests timed out frequently in the qbt job.
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331)
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16046:
--
Hadoop Flags: Reviewed

> TestBalanceProcedureScheduler and TestDistCpProcedure timeout
> -
>
> Key: HDFS-16046
> URL: https://issues.apache.org/jira/browse/HDFS-16046
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, 
> screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following two tests timed out frequently in the qbt job.
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331)
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16075:
--
Component/s: hdfs

> Use empty array constants present in StorageType and DatanodeInfo to avoid 
> creating redundant objects
> -
>
> Key: HDFS-16075
> URL: https://issues.apache.org/jira/browse/HDFS-16075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> StorageType and DatanodeInfo already provides empty array constants. We 
> should use them where possible in order to avoid creating unnecessary new 
> empty array objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16007:
--
Component/s: hdfs

> Deserialization of ReplicaState should avoid throwing 
> ArrayIndexOutOfBoundsException
> 
>
> Key: HDFS-16007
> URL: https://issues.apache.org/jira/browse/HDFS-16007
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: junwen yang
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ReplicaState enum is using ordinal to conduct serialization and 
> deserialization, which is vulnerable to the order, to cause issues similar to 
> HDFS-15624.
> To avoid it, either adding comments to let later developer not to change this 
> enum, or add index checking in the read and getState function to avoid index 
> out of bound error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16014) Fix an issue in checking native pmdk lib by 'hadoop checknative' command

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16014:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.4, 3.4.0  (was: 3.4.0, 3.2.4)

> Fix an issue in checking native pmdk lib by 'hadoop checknative' command
> 
>
> Key: HDFS-16014
> URL: https://issues.apache.org/jira/browse/HDFS-16014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: native
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-16014-01.patch, HDFS-16014-02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In HDFS-14818, we proposed a patch to support checking native pmdk lib. The 
> expected target is to display hint to user regarding pmdk lib loaded state. 
> Recently, it was found that pmdk lib was not successfully loaded actually but 
> the `hadoop checknative` command still tells user that it was. This issue can 
> be reproduced by moving libpmem.so* from specified installed path to other 
> place, or directly deleting these libs, after the project is built.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16007:
--
Affects Version/s: 3.3.1
   3.4.0

> Deserialization of ReplicaState should avoid throwing 
> ArrayIndexOutOfBoundsException
> 
>
> Key: HDFS-16007
> URL: https://issues.apache.org/jira/browse/HDFS-16007
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 3.4.0
>Reporter: junwen yang
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ReplicaState enum is using ordinal to conduct serialization and 
> deserialization, which is vulnerable to the order, to cause issues similar to 
> HDFS-15624.
> To avoid it, either adding comments to let later developer not to change this 
> enum, or add index checking in the read and getState function to avoid index 
> out of bound error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16001:
--
Affects Version/s: 3.3.1
   3.4.0

> TestOfflineEditsViewer.testStored() fails reading negative value of 
> FSEditLogOpCodes
> 
>
> Key: HDFS-16001
> URL: https://issues.apache.org/jira/browse/HDFS-16001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Konstantin Shvachko
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
> {noformat}
> java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 
> 17
> {noformat}
> Seems like there is a corrupt record in {{editsStored}} file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16075:
--
Affects Version/s: 3.3.2
   3.4.0

> Use empty array constants present in StorageType and DatanodeInfo to avoid 
> creating redundant objects
> -
>
> Key: HDFS-16075
> URL: https://issues.apache.org/jira/browse/HDFS-16075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> StorageType and DatanodeInfo already provides empty array constants. We 
> should use them where possible in order to avoid creating unnecessary new 
> empty array objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15790:
--
Affects Version/s: 3.3.1
   3.4.0

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 3.4.0
>Reporter: David Mollitor
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15790:
--
Component/s: ipc

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.1, 3.4.0
>Reporter: David Mollitor
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15790:
--
Hadoop Flags: Reviewed

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.1, 3.4.0
>Reporter: David Mollitor
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15788:
--
Hadoop Flags: Reviewed

> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15725:
--
Hadoop Flags: Reviewed

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, 
> HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, 
> HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15749) Make size of editPendingQ can be configurable

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15749:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.3, 3.3.0, 3.4.0  (was: 3.3.0, 3.4.0, 3.2.3)

> Make size of editPendingQ can be configurable
> -
>
> Key: HDFS-15749
> URL: https://issues.apache.org/jira/browse/HDFS-15749
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16595:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0  (was: 3.4.0, 3.3.5)

> Slow peer metrics - add median, mad and upper latency limits
> 
>
> Key: HDFS-16595
> URL: https://issues.apache.org/jira/browse/HDFS-16595
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Slow datanode metrics include slow node and it's reporting node details. With 
> HDFS-16582, we added the aggregate latency that is perceived by the reporting 
> nodes.
> In order to get more insights into how the outlier slownode's latencies 
> differ from the rest of the nodes, we should also expose median, median 
> absolute deviation and the calculated upper latency limit details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16595:
--
Affects Version/s: 3.3.5
   3.4.0

> Slow peer metrics - add median, mad and upper latency limits
> 
>
> Key: HDFS-16595
> URL: https://issues.apache.org/jira/browse/HDFS-16595
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Slow datanode metrics include slow node and it's reporting node details. With 
> HDFS-16582, we added the aggregate latency that is perceived by the reporting 
> nodes.
> In order to get more insights into how the outlier slownode's latencies 
> differ from the rest of the nodes, we should also expose median, median 
> absolute deviation and the calculated upper latency limit details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16595:
--
Component/s: metrics

> Slow peer metrics - add median, mad and upper latency limits
> 
>
> Key: HDFS-16595
> URL: https://issues.apache.org/jira/browse/HDFS-16595
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Slow datanode metrics include slow node and it's reporting node details. With 
> HDFS-16582, we added the aggregate latency that is perceived by the reporting 
> nodes.
> In order to get more insights into how the outlier slownode's latencies 
> differ from the rest of the nodes, we should also expose median, median 
> absolute deviation and the calculated upper latency limit details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16330:
--
Component/s: datanode

> Fix incorrect placeholder for Exception logs in DiskBalancer
> 
>
> Key: HDFS-16330
> URL: https://issues.apache.org/jira/browse/HDFS-16330
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16330:
--
Affects Version/s: 3.3.2
   3.4.0

> Fix incorrect placeholder for Exception logs in DiskBalancer
> 
>
> Key: HDFS-16330
> URL: https://issues.apache.org/jira/browse/HDFS-16330
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16330:
--
Hadoop Flags: Reviewed

> Fix incorrect placeholder for Exception logs in DiskBalancer
> 
>
> Key: HDFS-16330
> URL: https://issues.apache.org/jira/browse/HDFS-16330
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17285) RBF: Add a safe mode check period configuration

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17285:
--
Component/s: rbf

> RBF: Add a safe mode check period configuration
> ---
>
> Key: HDFS-17285
> URL: https://issues.apache.org/jira/browse/HDFS-17285
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When dfsrouter start, it enters safe mode. And it will cost 1min to leave.
> The log is blow:
> 14:35:23,717 INFO 
> org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave 
> startup safe mode after 3 ms
> 14:35:23,717 INFO 
> org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter 
> safe mode after 18 ms without reaching the State Store
> 14:35:23,717 INFO 
> org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: 
> Entering safe mode
> 14:35:24,996 INFO 
> org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: 
> Delaying safemode exit for 28721 milliseconds...
> 14:36:25,037 INFO 
> org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: 
> Leaving safe mode after 61319 milliseconds
> It depends on these configs.
> DFS_ROUTER_SAFEMODE_EXTENSION 30s 
> DFS_ROUTER_SAFEMODE_EXPIRATION 3min
> DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min  (it is the period for check safe mode)
> Because in safe mode dfsrouter will reject write requests, so it should be 
> shorter in check period if refreshCaches is done.  And we should remove 
> DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17290:
--
Component/s: metrics

> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16323:
--
Component/s: datanode

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16323:
--
Hadoop Flags: Reviewed

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16323:
--
Affects Version/s: 3.3.2
   3.4.0

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16255) RBF: Fix dead link to fedbalance document

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16255:
--
Hadoop Flags: Reviewed

> RBF: Fix dead link to fedbalance document
> -
>
> Key: HDFS-16255
> URL: https://issues.apache.org/jira/browse/HDFS-16255
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a dead link in HDFSRouterFederation.md 
> (https://github.com/apache/hadoop/blob/e90c41af34ada9d7b61e4d5a8b88c2f62c7fea25/hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md?plain=1#L517)
> {{../../../hadoop-federation-balance/HDFSFederationBalance.md}} should be 
> {{../../hadoop-federation-balance/HDFSFederationBalance.md}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16252:
--
Affects Version/s: 3.3.2
   3.4.0

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16256) Minor fixes in HDFS Fedbalance document

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16256:
--
Hadoop Flags: Reviewed

> Minor fixes in HDFS Fedbalance document
> ---
>
> Key: HDFS-16256
> URL: https://issues.apache.org/jira/browse/HDFS-16256
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 1. "Command submit has 4 options:" is not true. Now it has actually 6 
> options. It should be updated to something like "Command submit has the 
> following options".
> 2. 
> {code}
> ### Configuration Options
> 
> {code}
> In the above code, the "" is not needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16252:
--
Component/s: documentation

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16255) RBF: Fix dead link to fedbalance document

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16255:
--
Affects Version/s: 3.4.0

> RBF: Fix dead link to fedbalance document
> -
>
> Key: HDFS-16255
> URL: https://issues.apache.org/jira/browse/HDFS-16255
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a dead link in HDFSRouterFederation.md 
> (https://github.com/apache/hadoop/blob/e90c41af34ada9d7b61e4d5a8b88c2f62c7fea25/hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md?plain=1#L517)
> {{../../../hadoop-federation-balance/HDFSFederationBalance.md}} should be 
> {{../../hadoop-federation-balance/HDFSFederationBalance.md}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16256) Minor fixes in HDFS Fedbalance document

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16256:
--
Affects Version/s: 3.4.0

> Minor fixes in HDFS Fedbalance document
> ---
>
> Key: HDFS-16256
> URL: https://issues.apache.org/jira/browse/HDFS-16256
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 1. "Command submit has 4 options:" is not true. Now it has actually 6 
> options. It should be updated to something like "Command submit has the 
> following options".
> 2. 
> {code}
> ### Configuration Options
> 
> {code}
> In the above code, the "" is not needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16127:
--
Component/s: hdfs

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16092:
--
Component/s: hdfs

> Avoid creating LayoutFlags redundant objects
> 
>
> Key: HDFS-16092
> URL: https://issues.apache.org/jira/browse/HDFS-16092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use LayoutFlags to represent features that EditLog/FSImage can support. 
> The utility helps write int (0) to given OutputStream and if EditLog/FSImage 
> supports Layout flags, they read the value from InputStream to confirm 
> whether there are unsupported feature flags (non zero int). However, we also 
> create and return new object of LayoutFlags, which is not used anywhere 
> because it's just a utility to read/write to/from given stream. We should 
> remove such redundant objects from getting created while reading from 
> InputStream using LayoutFlags#read utility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16092:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.2.3, 3.4.0  (was: 3.4.0, 3.2.3, 3.3.2)

> Avoid creating LayoutFlags redundant objects
> 
>
> Key: HDFS-16092
> URL: https://issues.apache.org/jira/browse/HDFS-16092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use LayoutFlags to represent features that EditLog/FSImage can support. 
> The utility helps write int (0) to given OutputStream and if EditLog/FSImage 
> supports Layout flags, they read the value from InputStream to confirm 
> whether there are unsupported feature flags (non zero int). However, we also 
> create and return new object of LayoutFlags, which is not used anywhere 
> because it's just a utility to read/write to/from given stream. We should 
> remove such redundant objects from getting created while reading from 
> InputStream using LayoutFlags#read utility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16092:
--
Affects Version/s: 3.3.2
   3.4.0

> Avoid creating LayoutFlags redundant objects
> 
>
> Key: HDFS-16092
> URL: https://issues.apache.org/jira/browse/HDFS-16092
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use LayoutFlags to represent features that EditLog/FSImage can support. 
> The utility helps write int (0) to given OutputStream and if EditLog/FSImage 
> supports Layout flags, they read the value from InputStream to confirm 
> whether there are unsupported feature flags (non zero int). However, we also 
> create and return new object of LayoutFlags, which is not used anywhere 
> because it's just a utility to read/write to/from given stream. We should 
> remove such redundant objects from getting created while reading from 
> InputStream using LayoutFlags#read utility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16090:
--
Affects Version/s: 3.3.2
   3.4.0

> Fine grained locking for datanodeNetworkCounts
> --
>
> Key: HDFS-16090
> URL: https://issues.apache.org/jira/browse/HDFS-16090
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While incrementing DataNode network error count, we lock entire LoadingCache 
> in order to increment network count of specific host. We should provide fine 
> grained concurrency for this update because locking entire cache is redundant 
> and could impact performance while incrementing network count for multiple 
> hosts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16082:
--
Affects Version/s: 3.3.2
   3.4.0

> Avoid non-atomic operations on exceptionsSinceLastBalance and 
> failedTimesSinceLastSuccessfulBalance in Balancer
> ---
>
> Key: HDFS-16082
> URL: https://issues.apache.org/jira/browse/HDFS-16082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Balancer has introduced 2 volatile int as part of HDFS-13783 namely: 
> exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. 
> However, we are performing non-atomic operations on it. Since non-atomic 
> operations done here mostly depend on their previous values, we should use 
> AtomicInteger for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16082:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Avoid non-atomic operations on exceptionsSinceLastBalance and 
> failedTimesSinceLastSuccessfulBalance in Balancer
> ---
>
> Key: HDFS-16082
> URL: https://issues.apache.org/jira/browse/HDFS-16082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Balancer has introduced 2 volatile int as part of HDFS-13783 namely: 
> exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. 
> However, we are performing non-atomic operations on it. Since non-atomic 
> operations done here mostly depend on their previous values, we should use 
> AtomicInteger for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16082:
--
Component/s: balancer

> Avoid non-atomic operations on exceptionsSinceLastBalance and 
> failedTimesSinceLastSuccessfulBalance in Balancer
> ---
>
> Key: HDFS-16082
> URL: https://issues.apache.org/jira/browse/HDFS-16082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Balancer has introduced 2 volatile int as part of HDFS-13783 namely: 
> exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. 
> However, we are performing non-atomic operations on it. Since non-atomic 
> operations done here mostly depend on their previous values, we should use 
> AtomicInteger for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16080:
--
Affects Version/s: 3.3.2
   3.4.0

> RBF: Invoking method in all locations should break the loop after successful 
> result
> ---
>
> Key: HDFS-16080
> URL: https://issues.apache.org/jira/browse/HDFS-16080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> rename, delete and mkdir used by Router client usually calls multiple 
> locations if the path is present in multiple sub-clusters. After invoking 
> multiple concurrent proxy calls to multiple clients, we iterate through all 
> results and mark anyResult true if at least one of them was successful. We 
> should break the loop if one of the proxy call result was successful rather 
> than iterating over remaining calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16090:
--
Component/s: datanode

> Fine grained locking for datanodeNetworkCounts
> --
>
> Key: HDFS-16090
> URL: https://issues.apache.org/jira/browse/HDFS-16090
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While incrementing DataNode network error count, we lock entire LoadingCache 
> in order to increment network count of specific host. We should provide fine 
> grained concurrency for this update because locking entire cache is redundant 
> and could impact performance while incrementing network count for multiple 
> hosts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15796:
--
Hadoop Flags: Reviewed

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15798:
--
Hadoop Flags: Reviewed

> EC: Reconstruct task failed, and It would be XmitsInProgress of DN has 
> negative number
> --
>
> Key: HDFS-15798
> URL: https://issues.apache.org/jira/browse/HDFS-15798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, 
> HDFS-15798.003.patch
>
>
> The EC reconstruct task failed, and the decrementXmitsInProgress of 
> processErasureCodingTasks operation abnormal value ;
>  It would be XmitsInProgress of DN has negative number, it affects NN chooses 
> pending tasks based on the ratio between the lengths of replication and 
> erasure-coded block queues.
> {code:java}
> // 1.ErasureCodingWorker.java
> public void processErasureCodingTasks(
> Collection ecTasks) {
>   for (BlockECReconstructionInfo reconInfo : ecTasks) {
> int xmitsSubmitted = 0;
> try {
>   ...
>   // It may throw IllegalArgumentException from task#stripedReader
>   // constructor.
>   final StripedBlockReconstructor task =
>   new StripedBlockReconstructor(this, stripedReconInfo);
>   if (task.hasValidTargets()) {
> // See HDFS-12044. We increase xmitsInProgress even the task is only
> // enqueued, so that
> //   1) NN will not send more tasks than what DN can execute and
> //   2) DN will not throw away reconstruction tasks, and instead keeps
> //  an unbounded number of tasks in the executor's task queue.
> xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
> getDatanode().incrementXmitsInProcess(xmitsSubmitted); //  task start 
> increment
> stripedReconstructionPool.submit(task);
>   } else {
> LOG.warn("No missing internal block. Skip reconstruction for task:{}",
> reconInfo);
>   }
> } catch (Throwable e) {
>   getDatanode().decrementXmitsInProgress(xmitsSubmitted); //  task failed 
> decrement,  XmitsInProgress is decremented by the previous value
>   LOG.warn("Failed to reconstruct striped block {}",
>   reconInfo.getExtendedBlock().getLocalBlock(), e);
> }
>   }
> }
> // 2.StripedBlockReconstructor.java
> public void run() {
>   try {
> initDecoderIfNecessary();
>...
>   } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> getDatanode().getMetrics().incrECFailedReconstructionTasks();
>   } finally {
> float xmitWeight = getErasureCodingWorker().getXmitWeight();
> // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1
> // because if it set to zero, we cannot to measure the xmits submitted
> int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1);
> getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete 
> decrement
> ...
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15798:
--
Component/s: erasure-coding

> EC: Reconstruct task failed, and It would be XmitsInProgress of DN has 
> negative number
> --
>
> Key: HDFS-15798
> URL: https://issues.apache.org/jira/browse/HDFS-15798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, 
> HDFS-15798.003.patch
>
>
> The EC reconstruct task failed, and the decrementXmitsInProgress of 
> processErasureCodingTasks operation abnormal value ;
>  It would be XmitsInProgress of DN has negative number, it affects NN chooses 
> pending tasks based on the ratio between the lengths of replication and 
> erasure-coded block queues.
> {code:java}
> // 1.ErasureCodingWorker.java
> public void processErasureCodingTasks(
> Collection ecTasks) {
>   for (BlockECReconstructionInfo reconInfo : ecTasks) {
> int xmitsSubmitted = 0;
> try {
>   ...
>   // It may throw IllegalArgumentException from task#stripedReader
>   // constructor.
>   final StripedBlockReconstructor task =
>   new StripedBlockReconstructor(this, stripedReconInfo);
>   if (task.hasValidTargets()) {
> // See HDFS-12044. We increase xmitsInProgress even the task is only
> // enqueued, so that
> //   1) NN will not send more tasks than what DN can execute and
> //   2) DN will not throw away reconstruction tasks, and instead keeps
> //  an unbounded number of tasks in the executor's task queue.
> xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
> getDatanode().incrementXmitsInProcess(xmitsSubmitted); //  task start 
> increment
> stripedReconstructionPool.submit(task);
>   } else {
> LOG.warn("No missing internal block. Skip reconstruction for task:{}",
> reconInfo);
>   }
> } catch (Throwable e) {
>   getDatanode().decrementXmitsInProgress(xmitsSubmitted); //  task failed 
> decrement,  XmitsInProgress is decremented by the previous value
>   LOG.warn("Failed to reconstruct striped block {}",
>   reconInfo.getExtendedBlock().getLocalBlock(), e);
> }
>   }
> }
> // 2.StripedBlockReconstructor.java
> public void run() {
>   try {
> initDecoderIfNecessary();
>...
>   } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> getDatanode().getMetrics().incrECFailedReconstructionTasks();
>   } finally {
> float xmitWeight = getErasureCodingWorker().getXmitWeight();
> // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1
> // because if it set to zero, we cannot to measure the xmits submitted
> int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1);
> getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete 
> decrement
> ...
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15798:
--
Affects Version/s: 3.3.1
   3.4.0

> EC: Reconstruct task failed, and It would be XmitsInProgress of DN has 
> negative number
> --
>
> Key: HDFS-15798
> URL: https://issues.apache.org/jira/browse/HDFS-15798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, 
> HDFS-15798.003.patch
>
>
> The EC reconstruct task failed, and the decrementXmitsInProgress of 
> processErasureCodingTasks operation abnormal value ;
>  It would be XmitsInProgress of DN has negative number, it affects NN chooses 
> pending tasks based on the ratio between the lengths of replication and 
> erasure-coded block queues.
> {code:java}
> // 1.ErasureCodingWorker.java
> public void processErasureCodingTasks(
> Collection ecTasks) {
>   for (BlockECReconstructionInfo reconInfo : ecTasks) {
> int xmitsSubmitted = 0;
> try {
>   ...
>   // It may throw IllegalArgumentException from task#stripedReader
>   // constructor.
>   final StripedBlockReconstructor task =
>   new StripedBlockReconstructor(this, stripedReconInfo);
>   if (task.hasValidTargets()) {
> // See HDFS-12044. We increase xmitsInProgress even the task is only
> // enqueued, so that
> //   1) NN will not send more tasks than what DN can execute and
> //   2) DN will not throw away reconstruction tasks, and instead keeps
> //  an unbounded number of tasks in the executor's task queue.
> xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
> getDatanode().incrementXmitsInProcess(xmitsSubmitted); //  task start 
> increment
> stripedReconstructionPool.submit(task);
>   } else {
> LOG.warn("No missing internal block. Skip reconstruction for task:{}",
> reconInfo);
>   }
> } catch (Throwable e) {
>   getDatanode().decrementXmitsInProgress(xmitsSubmitted); //  task failed 
> decrement,  XmitsInProgress is decremented by the previous value
>   LOG.warn("Failed to reconstruct striped block {}",
>   reconInfo.getExtendedBlock().getLocalBlock(), e);
> }
>   }
> }
> // 2.StripedBlockReconstructor.java
> public void run() {
>   try {
> initDecoderIfNecessary();
>...
>   } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> getDatanode().getMetrics().incrECFailedReconstructionTasks();
>   } finally {
> float xmitWeight = getErasureCodingWorker().getXmitWeight();
> // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1
> // because if it set to zero, we cannot to measure the xmits submitted
> int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1);
> getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete 
> decrement
> ...
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16473) Make HDFS stat tool cross platform

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16473:
--
Hadoop Flags: Reviewed

> Make HDFS stat tool cross platform
> --
>
> Key: HDFS-16473
> URL: https://issues.apache.org/jira/browse/HDFS-16473
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs++, tools
>Affects Versions: 3.4.0
> Environment: Centos 7, Centos 8, Debian 10, Ubuntu Focal
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: libhdfscpp, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The source files for *hdfs_stat* uses *getopt* for parsing the command line 
> arguments. getopt is available only on Linux and thus, isn't cross platform. 
> We need to replace getopt with *boost::program_options* to make this tool 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16474) Make HDFS tail tool cross platform

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16474:
--
Hadoop Flags: Reviewed

> Make HDFS tail tool cross platform
> --
>
> Key: HDFS-16474
> URL: https://issues.apache.org/jira/browse/HDFS-16474
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs++, tools
>Affects Versions: 3.4.0
> Environment: Centos 7, Centos 8, Debian 10, Ubuntu Focal
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: libhdfscpp, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The source files for *hdfs_tail* uses *getopt* for parsing the command line 
> arguments. getopt is available only on Linux and thus, isn't cross platform. 
> We need to replace getopt with *boost::program_options* to make these tools 
> cross platform.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16227) testMoverWithStripedFile fails intermittently

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16227:
--
Component/s: test

> testMoverWithStripedFile fails intermittently
> -
>
> Key: HDFS-16227
> URL: https://issues.apache.org/jira/browse/HDFS-16227
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> TestMover#testMoverWithStripedFile fails intermittently with stacktrace:
> {code:java}
> [ERROR] 
> testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover)  Time 
> elapsed: 48.439 s  <<< FAILURE![ERROR] 
> testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover)  Time 
> elapsed: 48.439 s  <<< FAILURE!java.lang.AssertionError: expected: 
> but was: at org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:120) at 
> org.junit.Assert.assertEquals(Assert.java:146) at 
> org.apache.hadoop.hdfs.server.mover.TestMover.testMoverWithStripedFile(TestMover.java:965)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> e.g 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3386/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16218) RBF: Use HdfsConfiguration for passing in Router principal

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16218:
--
Affects Version/s: 3.4.0

> RBF: Use HdfsConfiguration for passing in Router principal
> --
>
> Key: HDFS-16218
> URL: https://issues.apache.org/jira/browse/HDFS-16218
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: Hadoop 3.3.0 + patches, Kerberos authentication is 
> enabled
>Reporter: Akira Ajisaka
>Assignee: Fengnan Li
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled 
> because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not 
> loaded.
> {quote}
> 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job 
> failed.
> java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort 
> /:0. Failed on local exception: java.io.IOException: Couldn't set 
> up IO streams: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376)
> {quote}
> When adding the property specifically by "-D" option, the command worked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16219) RBF: Set default map tasks and bandwidth in RouterFederationRename

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16219:
--
Affects Version/s: 3.4.0

> RBF: Set default map tasks and bandwidth in RouterFederationRename
> --
>
> Key: HDFS-16219
> URL: https://issues.apache.org/jira/browse/HDFS-16219
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: Hadoop 3.3.0 with patches
>Reporter: Akira Ajisaka
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If dfs.federation.router.federation.rename.map or 
> dfs.federation.router.federation.rename.bandwidth is not set, DFSRouter fails 
> to launch.
> This issue is similar to HDFS-16217.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16224) testBalancerWithObserverWithFailedNode times out

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16224:
--
Affects Version/s: 3.4.0

> testBalancerWithObserverWithFailedNode times out
> 
>
> Key: HDFS-16224
> URL: https://issues.apache.org/jira/browse/HDFS-16224
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> testBalancerWithObserverWithFailedNode fails intermittently.
>  
> Seems it is because of datanode cannot shutdown because we need to wait for 
> datanodes to finish retries to failed observer.
>  
> Jenkins report:
>  
> [ERROR] 
> testBalancerWithObserverWithFailedNode(org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes)
>  Time elapsed: 180.144 s <<< ERROR! 
> org.junit.runners.model.TestTimedOutException: test timed out after 18 
> milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Thread.join(Thread.java:1252) at 
> java.lang.Thread.join(Thread.java:1326) at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:632)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:360)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:119)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2169) 
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2166)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2156)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2135) 
> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2109) 
> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2102) 
> at 
> org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.shutdown(MiniQJMHACluster.java:189)
>  at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithObserver(TestBalancerWithHANameNodes.java:240)
>  at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithObserverWithFailedNode(TestBalancerWithHANameNodes.java:197)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16217) RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by adding appropriate config resources

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16217:
--
Affects Version/s: 3.4.0

> RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by 
> adding appropriate config resources
> 
>
> Key: HDFS-16217
> URL: https://issues.apache.org/jira/browse/HDFS-16217
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: Hadoop 3.3.0 with patches
>Reporter: Akira Ajisaka
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When dfs.federation.router.federation.rename.option is set to DISTCP and 
> hdfs.fedbalance.procedure.scheduler.journal.uri is not set, DFSRouter fails 
> to launch.
> {quote}
> 2021-09-08 15:39:11,818 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
> router
> java.lang.NullPointerException
> at java.base/java.net.URI$Parser.parse(URI.java:3104)
> at java.base/java.net.URI.(URI.java:600)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.initRouterFedRename(RouterRpcServer.java:444)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:419)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> {quote}
> hdfs.fedbalance.procedure.scheduler.journal.uri is 
> hdfs://localhost:8020/tmp/procedure by default, however, the default value is 
> not used in DFSRouter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16219) RBF: Set default map tasks and bandwidth in RouterFederationRename

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16219:
--
Hadoop Flags: Reviewed

> RBF: Set default map tasks and bandwidth in RouterFederationRename
> --
>
> Key: HDFS-16219
> URL: https://issues.apache.org/jira/browse/HDFS-16219
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: Hadoop 3.3.0 with patches
>Reporter: Akira Ajisaka
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If dfs.federation.router.federation.rename.map or 
> dfs.federation.router.federation.rename.bandwidth is not set, DFSRouter fails 
> to launch.
> This issue is similar to HDFS-16217.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16213) Flaky test TestFsDatasetImpl#testDnRestartWithHardLink

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16213:
--
Component/s: test

> Flaky test TestFsDatasetImpl#testDnRestartWithHardLink
> --
>
> Key: HDFS-16213
> URL: https://issues.apache.org/jira/browse/HDFS-16213
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Failure case: 
> [here|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3359/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
> {code:java}
> [ERROR] 
> testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 7.768 s  <<< FAILURE![ERROR] 
> testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 7.768 s  <<< FAILURE!java.lang.AssertionError at 
> org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16218) RBF: Use HdfsConfiguration for passing in Router principal

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16218:
--
Hadoop Flags: Reviewed

> RBF: Use HdfsConfiguration for passing in Router principal
> --
>
> Key: HDFS-16218
> URL: https://issues.apache.org/jira/browse/HDFS-16218
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: Hadoop 3.3.0 + patches, Kerberos authentication is 
> enabled
>Reporter: Akira Ajisaka
>Assignee: Fengnan Li
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled 
> because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not 
> loaded.
> {quote}
> 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job 
> failed.
> java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort 
> /:0. Failed on local exception: java.io.IOException: Couldn't set 
> up IO streams: java.lang.IllegalArgumentException: Failed to specify server's 
> Kerberos principal name
>   at 
> org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376)
> {quote}
> When adding the property specifically by "-D" option, the command worked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16213) Flaky test TestFsDatasetImpl#testDnRestartWithHardLink

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16213:
--
Affects Version/s: 3.4.0

> Flaky test TestFsDatasetImpl#testDnRestartWithHardLink
> --
>
> Key: HDFS-16213
> URL: https://issues.apache.org/jira/browse/HDFS-16213
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Failure case: 
> [here|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3359/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
> {code:java}
> [ERROR] 
> testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 7.768 s  <<< FAILURE![ERROR] 
> testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
>   Time elapsed: 7.768 s  <<< FAILURE!java.lang.AssertionError at 
> org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16217) RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by adding appropriate config resources

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16217:
--
Hadoop Flags: Reviewed

> RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by 
> adding appropriate config resources
> 
>
> Key: HDFS-16217
> URL: https://issues.apache.org/jira/browse/HDFS-16217
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: Hadoop 3.3.0 with patches
>Reporter: Akira Ajisaka
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When dfs.federation.router.federation.rename.option is set to DISTCP and 
> hdfs.fedbalance.procedure.scheduler.journal.uri is not set, DFSRouter fails 
> to launch.
> {quote}
> 2021-09-08 15:39:11,818 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
> router
> java.lang.NullPointerException
> at java.base/java.net.URI$Parser.parse(URI.java:3104)
> at java.base/java.net.URI.(URI.java:600)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.initRouterFedRename(RouterRpcServer.java:444)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:419)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> {quote}
> hdfs.fedbalance.procedure.scheduler.journal.uri is 
> hdfs://localhost:8020/tmp/procedure by default, however, the default value is 
> not used in DFSRouter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15657:
--
Affects Version/s: 3.3.1
   3.4.0

> RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
> -
>
> Key: HDFS-15657
> URL: https://issues.apache.org/jira/browse/HDFS-15657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
> {noformat}
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 
> s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter
> [ERROR] 
> testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter)
>   Time elapsed: 1.04 s  <<< ERROR!
> org.apache.hadoop.service.ServiceStateException: java.net.BindException: 
> Problem binding to [0.0.0.0:] java.net.BindException: Address already in 
> use; For more details see:  http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:174)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
>

[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15657:
--
Hadoop Flags: Reviewed

> RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
> -
>
> Key: HDFS-15657
> URL: https://issues.apache.org/jira/browse/HDFS-15657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
> {noformat}
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 
> s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter
> [ERROR] 
> testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter)
>   Time elapsed: 1.04 s  <<< ERROR!
> org.apache.hadoop.service.ServiceStateException: java.net.BindException: 
> Problem binding to [0.0.0.0:] java.net.BindException: Address already in 
> use; For more details see:  http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:174)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
>

[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16637:
--
Affects Version/s: 3.3.5
   3.4.0

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
> hdfs://localhost:51486 -printTopology]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(167)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(174)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(178)) -                  Comparator: 
> [RegexpAcrossOutputComparator]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(180)) -          Comparision result:   
> [fail]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(182)) -             Expected output:   
> [^Rack: 
> \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
> 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(184)) -               Actual output:   
> [Rack: /rack1
>    127.0.0.1:51487 (localhost) In Service
>    127.0.0.1:51491 (localhost) In ServiceRack: /rack2
>    127.0.0.1:51500 (localhost) In Service
>    127.0.0.1:51496 (localhost) In Service
>    127.0.0.1:51504 (localhost) In ServiceRack: /rack3
>    127.0.0.1:51508 (localhost) In ServiceRack: /rack4
>    127.0.0.1:51512 (localhost) In Service
>    127.0.0.1:51516 (localhost) In Service]
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16634:
--
Component/s: metrics

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16635) Fix javadoc error in Java 11

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16635:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0  (was: 3.4.0, 3.3.5)

> Fix javadoc error in Java 11
> 
>
> Key: HDFS-16635
> URL: https://issues.apache.org/jira/browse/HDFS-16635
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Javadoc build in Java 11 fails.
> {noformat}
> [ERROR] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20:
>  error: reference not found
> [ERROR]  * This package provides a mechanism for tracking {@link NameNode} 
> startup
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16634:
--
Affects Version/s: 3.4.0

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16635) Fix javadoc error in Java 11

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16635:
--
Affects Version/s: 3.4.0

> Fix javadoc error in Java 11
> 
>
> Key: HDFS-16635
> URL: https://issues.apache.org/jira/browse/HDFS-16635
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Javadoc build in Java 11 fails.
> {noformat}
> [ERROR] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20:
>  error: reference not found
> [ERROR]  * This package provides a mechanism for tracking {@link NameNode} 
> startup
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16637:
--
Component/s: test

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
> hdfs://localhost:51486 -printTopology]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(167)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(174)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(178)) -                  Comparator: 
> [RegexpAcrossOutputComparator]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(180)) -          Comparision result:   
> [fail]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(182)) -             Expected output:   
> [^Rack: 
> \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
> 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(184)) -               Actual output:   
> [Rack: /rack1
>    127.0.0.1:51487 (localhost) In Service
>    127.0.0.1:51491 (localhost) In ServiceRack: /rack2
>    127.0.0.1:51500 (localhost) In Service
>    127.0.0.1:51496 (localhost) In Service
>    127.0.0.1:51504 (localhost) In ServiceRack: /rack3
>    127.0.0.1:51508 (localhost) In ServiceRack: /rack4
>    127.0.0.1:51512 (localhost) In Service
>    127.0.0.1:51516 (localhost) In Service]
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16652:
--
Affects Version/s: 3.4.0

> Upgrade jquery datatable version references to v1.10.19
> ---
>
> Key: HDFS-16652
> URL: https://issues.apache.org/jira/browse/HDFS-16652
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-16652.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Upgrade jquery datatable version references in hdfs webapp to v1.10.19



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16618) sync_file_range error should include more volume and file info

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16618:
--
Affects Version/s: 3.3.5
   3.4.0

> sync_file_range error should include more volume and file info
> --
>
> Key: HDFS-16618
> URL: https://issues.apache.org/jira/browse/HDFS-16618
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Having seen multiple sync_file_range errors recently with Bad file 
> descriptor, it would be good to include more volume stats as well as file 
> offset/length info with the error log to get some more insights.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16618) sync_file_range error should include more volume and file info

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16618:
--
Component/s: datanode

> sync_file_range error should include more volume and file info
> --
>
> Key: HDFS-16618
> URL: https://issues.apache.org/jira/browse/HDFS-16618
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Having seen multiple sync_file_range errors recently with Bad file 
> descriptor, it would be good to include more volume stats as well as file 
> offset/length info with the error log to get some more insights.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16634:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.4.0  (was: 3.4.0, 3.3.5)

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: metrics
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16358) HttpFS implementation for getSnapshotDiffReportListing

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16358:
--
Affects Version/s: 3.4.0

> HttpFS implementation for getSnapshotDiffReportListing
> --
>
> Key: HDFS-16358
> URL: https://issues.apache.org/jira/browse/HDFS-16358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HttpFS should support getSnapshotDiffReportListing API for improved snapshot 
> diff. WebHdfs implementation available on HDFS-16091.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16350:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.2.3, 3.4.0  (was: 3.4.0, 3.2.3, 3.3.5)

> Datanode start time should be set after RPC server starts successfully
> --
>
> Key: HDFS-16350
> URL: https://issues.apache.org/jira/browse/HDFS-16350
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We set start time of Datanode when the class is instantiated but it should be 
> ideally set only after RPC server starts and RPC handlers are initialized to 
> serve client requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16350:
--
Component/s: datanode

> Datanode start time should be set after RPC server starts successfully
> --
>
> Key: HDFS-16350
> URL: https://issues.apache.org/jira/browse/HDFS-16350
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We set start time of Datanode when the class is instantiated but it should be 
> ideally set only after RPC server starts and RPC handlers are initialized to 
> serve client requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16358) HttpFS implementation for getSnapshotDiffReportListing

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16358:
--
Component/s: httpfs

> HttpFS implementation for getSnapshotDiffReportListing
> --
>
> Key: HDFS-16358
> URL: https://issues.apache.org/jira/browse/HDFS-16358
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HttpFS should support getSnapshotDiffReportListing API for improved snapshot 
> diff. WebHdfs implementation available on HDFS-16091.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16350:
--
Affects Version/s: 3.3.2
   3.4.0

> Datanode start time should be set after RPC server starts successfully
> --
>
> Key: HDFS-16350
> URL: https://issues.apache.org/jira/browse/HDFS-16350
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We set start time of Datanode when the class is instantiated but it should be 
> ideally set only after RPC server starts and RPC handlers are initialized to 
> serve client requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16336) De-flake TestRollingUpgrade#testRollback

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16336:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.5, 3.2.3, 3.4.0  (was: 3.4.0, 3.2.3, 3.3.5)

> De-flake TestRollingUpgrade#testRollback
> 
>
> Key: HDFS-16336
> URL: https://issues.apache.org/jira/browse/HDFS-16336
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, test
>Affects Versions: 3.4.0
>Reporter: Kevin Wikant
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This pull request: [https://github.com/apache/hadoop/pull/3675]
> Failed Jenkins pre-commit job due to an unrelated unit test failure: 
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3675/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
> {code:java}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade)
> [ERROR]   Run 1: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 
> expected null, but 
> was:  createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})>
> [ERROR]   Run 2: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 
> expected null, but 
> was:  createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})>
> [ERROR]   Run 3: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 
> expected null, but 
> was:  createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> {code}
> Seems that perhaps "TestRollingUpgrade.testRollback" is a flaky unit test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16171) De-flake testDecommissionStatus

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16171:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.2.3, 2.10.2, 3.4.0  (was: 3.4.0, 2.10.2, 3.2.3, 
3.3.2)

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16184) De-flake TestBlockScanner#testSkipRecentAccessFile

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16184:
--
Affects Version/s: 3.3.2
   3.4.0

> De-flake TestBlockScanner#testSkipRecentAccessFile
> --
>
> Key: HDFS-16184
> URL: https://issues.apache.org/jira/browse/HDFS-16184
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Test TestBlockScanner#testSkipRecentAccessFile is flaky:
>  
> {code:java}
> [ERROR] 
> testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner)
>   Time elapsed: 3.936 s  <<< FAILURE![ERROR] 
> testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner)
>   Time elapsed: 3.936 s  <<< FAILURE!java.lang.AssertionError: Scan nothing 
> for all files are accessed in last period. at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testSkipRecentAccessFile(TestBlockScanner.java:1015)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
> {code}
> e.g 
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3235/37/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16157:
--
Affects Version/s: 3.4.0

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2024-01-26 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16144:
--
Affects Version/s: 3.3.2
   3.4.0

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 3 4 >

1 - 100 of 318 matches

Mail list logo