[jira] [Commented] (HDFS-17354) Delay invoke clearStaleNamespacesInRouterStateIdContext during router start up
[ https://issues.apache.org/jira/browse/HDFS-17354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811425#comment-17811425 ] ASF GitHub Bot commented on HDFS-17354: --- simbadzina commented on PR #6498: URL: https://github.com/apache/hadoop/pull/6498#issuecomment-1912849626 Changes generally looks okay to me. Is this just an optimization to avoid clearing a map which is empty, or there can be an error if we clear before the router is in the RUNNING state. Can you please add a test case. > Delay invoke clearStaleNamespacesInRouterStateIdContext during router start > up > --- > > Key: HDFS-17354 > URL: https://issues.apache.org/jira/browse/HDFS-17354 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: lei w >Priority: Major > Labels: pull-request-available > > We should start clear expired namespace thread at RouterRpcServer RUNNING > phase because StateStoreService is Initialized in initialization phase. > Now, router will throw IoException when start up. > {panel:title=Exception} > 2024-01-09 16:27:06,939 WARN > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer: Could not > fetch current list of namespaces. > java.io.IOException: State Store does not have an interface for > MembershipStore > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getStoreInterface(MembershipNamenodeResolver.java:121) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getMembershipStore(MembershipNamenodeResolver.java:102) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces(MembershipNamenodeResolver.java:388) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.clearStaleNamespacesInRouterStateIdContext(RouterRpcServer.java:434) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {panel} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md
[ https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17325: -- Affects Version/s: 3.4.0 > Doc: Fix the documentation of fs expunge command in FileSystemShell.md > -- > > Key: HDFS-17325 > URL: https://issues.apache.org/jira/browse/HDFS-17325 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Fix doc in FileSystemShell.md. > hadoop fs -expunge --immediate should be hadoop fs -expunge -immediate > > Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17309: -- Component/s: rbf > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17306: -- Component/s: rdf router > RBF:Router should not return nameservices that does not enable observer nodes > in RpcResponseHeaderProto > --- > > Key: HDFS-17306 > URL: https://issues.apache.org/jira/browse/HDFS-17306 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rdf, router >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer > nodes, and client via DFSRouter comminutes with nns. > If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will > receive all nameservices in RpcResponseHeaderProto. > We should reduce rpc response size if nameservices don't enable > observer nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17309: -- Affects Version/s: 3.4.0 > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan
[ https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17310: -- Component/s: datanode > DiskBalancer: Enhance the log message for submitPlan > > > Key: HDFS-17310 > URL: https://issues.apache.org/jira/browse/HDFS-17310 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > In order to convenient troubleshoot problems, enhance the log message for > submitPlan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan
[ https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17310: -- Affects Version/s: 3.4.0 > DiskBalancer: Enhance the log message for submitPlan > > > Key: HDFS-17310 > URL: https://issues.apache.org/jira/browse/HDFS-17310 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > In order to convenient troubleshoot problems, enhance the log message for > submitPlan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17306: -- Affects Version/s: 3.4.0 > RBF:Router should not return nameservices that does not enable observer nodes > in RpcResponseHeaderProto > --- > > Key: HDFS-17306 > URL: https://issues.apache.org/jira/browse/HDFS-17306 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer > nodes, and client via DFSRouter comminutes with nns. > If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will > receive all nameservices in RpcResponseHeaderProto. > We should reduce rpc response size if nameservices don't enable > observer nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17312) packetsReceived metric should ignore heartbeat packet
[ https://issues.apache.org/jira/browse/HDFS-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17312: -- Hadoop Flags: Reviewed > packetsReceived metric should ignore heartbeat packet > - > > Key: HDFS-17312 > URL: https://issues.apache.org/jira/browse/HDFS-17312 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Metric packetsReceived should ignore heartbeat packet and only used to count > data packets and last packet in block. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md
[ https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17325: -- Component/s: documentation fs > Doc: Fix the documentation of fs expunge command in FileSystemShell.md > -- > > Key: HDFS-17325 > URL: https://issues.apache.org/jira/browse/HDFS-17325 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, fs >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Fix doc in FileSystemShell.md. > hadoop fs -expunge --immediate should be hadoop fs -expunge -immediate > > Usage: hadoop fs [generic options] -expunge [-immediate] [-fs ] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16716) Improve appendToFile command: support appending on file with new block
[ https://issues.apache.org/jira/browse/HDFS-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16716: -- Component/s: fs > Improve appendToFile command: support appending on file with new block > -- > > Key: HDFS-16716 > URL: https://issues.apache.org/jira/browse/HDFS-16716 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Affects Versions: 3.4.0, 3.3.6 >Reporter: guojunhao >Assignee: M1eyu2018 >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > Time Spent: 1h > Remaining Estimate: 0h > > HDFS client DistributedFileSystem#append supports appending to a file with > optional create flags. > However, appendToFile command only supports the default create flag APPEND so > that append on EC file without NEW_BLOCK create flag is not supported. > Thus, it's necessary to improve appendToFile command by adding option n for > it. Option n represents that use NEW_BLOCK create flag while appending file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16652: -- Component/s: ui > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: HDFS-16652 > URL: https://issues.apache.org/jira/browse/HDFS-16652 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Affects Versions: 3.4.0 >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16652.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade jquery datatable version references in hdfs webapp to v1.10.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads
[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16422: -- Hadoop Flags: Reviewed Target Version/s: 3.3.3, 3.4.0 (was: 3.4.0, 3.3.3) > Fix thread safety of EC decoding during concurrent preads > - > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause online reconstruction: read dataUnits part of data > and decode them into the target missing data. Each DFSStripedInputStream > object has a RawErasureDecoder object, and when we doing pread concurrently, > RawErasureDecoder.decode will be invoked concurrently too. > RawErasureDecoder.decode is not thread safe, as a result of that we get wrong > data from pread occasionally. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16384) Upgrade Netty to 4.1.72.Final
[ https://issues.apache.org/jira/browse/HDFS-16384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16384: -- Fix Version/s: (was: 3.4.0) > Upgrade Netty to 4.1.72.Final > - > > Key: HDFS-16384 > URL: https://issues.apache.org/jira/browse/HDFS-16384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.3.1 >Reporter: Tamas Penzes >Assignee: Tamas Penzes >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > New fixes for netty, nothing else changed, just netty version bumped and two > more exclusion in hdfs-client because of new netty. > No new tests added as not needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec
[ https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16252: -- Hadoop Flags: Reviewed > Correct docs for dfs.http.client.retry.policy.spec > --- > > Key: HDFS-16252 > URL: https://issues.apache.org/jira/browse/HDFS-16252 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.4.0, 3.3.2 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch > > > The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as > it has the wait time and retries switched around in the descriptio. Also, the > doc for dfs.client.retry.policy.spec is not present and should be the same as > for dfs.http.client.retry.policy.spec. > The code shows the timeout is first and then the number of retries: > {code} > String POLICY_SPEC_KEY = PREFIX + "policy.spec"; > String POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,... > // In RetryPolicies.java, we can see it gets the timeout as the first in > the pair >/** > * Parse the given string as a MultipleLinearRandomRetry object. > * The format of the string is "t_1, n_1, t_2, n_2, ...", > * where t_i and n_i are the i-th pair of sleep time and number of > retries. > * Note that the white spaces in the string are ignored. > * > * @return the parsed object, or null if the parsing fails. > */ > public static MultipleLinearRandomRetry parseCommaSeparatedString(String > s) { > final String[] elements = s.split(","); > if (elements.length == 0) { > LOG.warn("Illegal value: there is no element in \"" + s + "\"."); > return null; > } > if (elements.length % 2 != 0) { > LOG.warn("Illegal value: the number of elements in \"" + s + "\" is " > + elements.length + " but an even number of elements is > expected."); > return null; > } > final List pairs > = new ArrayList(); > > for(int i = 0; i < elements.length; ) { > //parse the i-th sleep-time > final int sleep = parsePositiveInt(elements, i++, s); > if (sleep == -1) { > return null; //parse fails > } > //parse the i-th number-of-retries > final int retries = parsePositiveInt(elements, i++, s); > if (retries == -1) { > return null; //parse fails > } > pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, > sleep)); > } > return new RetryPolicies.MultipleLinearRandomRetry(pairs); > } > {code} > This change simply updates the docs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16227) testMoverWithStripedFile fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16227: -- Hadoop Flags: Reviewed > testMoverWithStripedFile fails intermittently > - > > Key: HDFS-16227 > URL: https://issues.apache.org/jira/browse/HDFS-16227 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > TestMover#testMoverWithStripedFile fails intermittently with stacktrace: > {code:java} > [ERROR] > testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover) Time > elapsed: 48.439 s <<< FAILURE![ERROR] > testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover) Time > elapsed: 48.439 s <<< FAILURE!java.lang.AssertionError: expected: > but was: at org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:120) at > org.junit.Assert.assertEquals(Assert.java:146) at > org.apache.hadoop.hdfs.server.mover.TestMover.testMoverWithStripedFile(TestMover.java:965) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > {code} > e.g > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3386/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16227) testMoverWithStripedFile fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16227: -- Affects Version/s: 3.4.0 > testMoverWithStripedFile fails intermittently > - > > Key: HDFS-16227 > URL: https://issues.apache.org/jira/browse/HDFS-16227 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > TestMover#testMoverWithStripedFile fails intermittently with stacktrace: > {code:java} > [ERROR] > testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover) Time > elapsed: 48.439 s <<< FAILURE![ERROR] > testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover) Time > elapsed: 48.439 s <<< FAILURE!java.lang.AssertionError: expected: > but was: at org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:120) at > org.junit.Assert.assertEquals(Assert.java:146) at > org.apache.hadoop.hdfs.server.mover.TestMover.testMoverWithStripedFile(TestMover.java:965) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > {code} > e.g > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3386/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result
[ https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16080: -- Component/s: rbf > RBF: Invoking method in all locations should break the loop after successful > result > --- > > Key: HDFS-16080 > URL: https://issues.apache.org/jira/browse/HDFS-16080 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > rename, delete and mkdir used by Router client usually calls multiple > locations if the path is present in multiple sub-clusters. After invoking > multiple concurrent proxy calls to multiple clients, we iterate through all > results and mark anyResult true if at least one of them was successful. We > should break the loop if one of the proxy call result was successful rather > than iterating over remaining calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16075: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Use empty array constants present in StorageType and DatanodeInfo to avoid > creating redundant objects > - > > Key: HDFS-16075 > URL: https://issues.apache.org/jira/browse/HDFS-16075 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > StorageType and DatanodeInfo already provides empty array constants. We > should use them where possible in order to avoid creating unnecessary new > empty array objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16050) Some dynamometer tests fail
[ https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16050: -- Affects Version/s: 3.3.2 3.4.0 > Some dynamometer tests fail > --- > > Key: HDFS-16050 > URL: https://issues.apache.org/jira/browse/HDFS-16050 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > The following tests failed: > {quote}hadoop.tools.dynamometer.TestDynamometerInfra > hadoop.tools.dynamometer.blockgenerator.TestBlockGen > hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt] > {quote}[ERROR] > testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator) > Time elapsed: 1.353 s <<< ERROR! > java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer > at > org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618) > at > org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16050) Some dynamometer tests fail
[ https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16050: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Some dynamometer tests fail > --- > > Key: HDFS-16050 > URL: https://issues.apache.org/jira/browse/HDFS-16050 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > The following tests failed: > {quote}hadoop.tools.dynamometer.TestDynamometerInfra > hadoop.tools.dynamometer.blockgenerator.TestBlockGen > hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt] > {quote}[ERROR] > testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator) > Time elapsed: 1.353 s <<< ERROR! > java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer > at > org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618) > at > org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout
[ https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16046: -- Affects Version/s: 3.4.0 > TestBalanceProcedureScheduler and TestDistCpProcedure timeout > - > > Key: HDFS-16046 > URL: https://issues.apache.org/jira/browse/HDFS-16046 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, > screenshot-2.png > > Time Spent: 40m > Remaining Estimate: 0h > > The following two tests timed out frequently in the qbt job. > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 6 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331) > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 3 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout
[ https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16046: -- Hadoop Flags: Reviewed > TestBalanceProcedureScheduler and TestDistCpProcedure timeout > - > > Key: HDFS-16046 > URL: https://issues.apache.org/jira/browse/HDFS-16046 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, > screenshot-2.png > > Time Spent: 40m > Remaining Estimate: 0h > > The following two tests timed out frequently in the qbt job. > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 6 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331) > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 3 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16075: -- Component/s: hdfs > Use empty array constants present in StorageType and DatanodeInfo to avoid > creating redundant objects > - > > Key: HDFS-16075 > URL: https://issues.apache.org/jira/browse/HDFS-16075 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > StorageType and DatanodeInfo already provides empty array constants. We > should use them where possible in order to avoid creating unnecessary new > empty array objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16007: -- Component/s: hdfs > Deserialization of ReplicaState should avoid throwing > ArrayIndexOutOfBoundsException > > > Key: HDFS-16007 > URL: https://issues.apache.org/jira/browse/HDFS-16007 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: junwen yang >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ReplicaState enum is using ordinal to conduct serialization and > deserialization, which is vulnerable to the order, to cause issues similar to > HDFS-15624. > To avoid it, either adding comments to let later developer not to change this > enum, or add index checking in the read and getState function to avoid index > out of bound error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16014) Fix an issue in checking native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16014: -- Hadoop Flags: Reviewed Target Version/s: 3.2.4, 3.4.0 (was: 3.4.0, 3.2.4) > Fix an issue in checking native pmdk lib by 'hadoop checknative' command > > > Key: HDFS-16014 > URL: https://issues.apache.org/jira/browse/HDFS-16014 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16014-01.patch, HDFS-16014-02.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In HDFS-14818, we proposed a patch to support checking native pmdk lib. The > expected target is to display hint to user regarding pmdk lib loaded state. > Recently, it was found that pmdk lib was not successfully loaded actually but > the `hadoop checknative` command still tells user that it was. This issue can > be reproduced by moving libpmem.so* from specified installed path to other > place, or directly deleting these libs, after the project is built. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16007: -- Affects Version/s: 3.3.1 3.4.0 > Deserialization of ReplicaState should avoid throwing > ArrayIndexOutOfBoundsException > > > Key: HDFS-16007 > URL: https://issues.apache.org/jira/browse/HDFS-16007 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: junwen yang >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ReplicaState enum is using ordinal to conduct serialization and > deserialization, which is vulnerable to the order, to cause issues similar to > HDFS-15624. > To avoid it, either adding comments to let later developer not to change this > enum, or add index checking in the read and getState function to avoid index > out of bound error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
[ https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16001: -- Affects Version/s: 3.3.1 3.4.0 > TestOfflineEditsViewer.testStored() fails reading negative value of > FSEditLogOpCodes > > > Key: HDFS-16001 > URL: https://issues.apache.org/jira/browse/HDFS-16001 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Konstantin Shvachko >Assignee: Akira Ajisaka >Priority: Blocker > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception > {noformat} > java.io.IOException: Op -54 has size -1314247195, but the minimum op size is > 17 > {noformat} > Seems like there is a corrupt record in {{editsStored}} file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16075: -- Affects Version/s: 3.3.2 3.4.0 > Use empty array constants present in StorageType and DatanodeInfo to avoid > creating redundant objects > - > > Key: HDFS-16075 > URL: https://issues.apache.org/jira/browse/HDFS-16075 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > StorageType and DatanodeInfo already provides empty array constants. We > should use them where possible in order to avoid creating unnecessary new > empty array objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Affects Version/s: 3.3.1 3.4.0 > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Component/s: ipc > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Hadoop Flags: Reviewed > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15788: -- Hadoop Flags: Reviewed > Correct the statement for pmem cache to reflect cache persistence support > - > > Key: HDFS-15788 > URL: https://issues.apache.org/jira/browse/HDFS-15788 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Correct the statement for pmem cache to reflect cache persistence support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize
[ https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15725: -- Hadoop Flags: Reviewed > Lease Recovery never completes for a committed block which the DNs never > finalize > - > > Key: HDFS-15725 > URL: https://issues.apache.org/jira/browse/HDFS-15725 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, > HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, > HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch > > > It a very rare condition, the HDFS client process can get killed right at the > time it is completing a block / file. > The client sends the "complete" call to the namenode, moving the block into a > committed state, but it dies before it can send the final packet to the > Datanodes telling them to finalize the block. > This means the blocks are stuck on the datanodes in RBW state and nothing > will ever tell them to move out of that state. > The namenode / lease manager will retry forever to close the file, but it > will always complain it is waiting for blocks to reach minimal replication. > I have a simple test and patch to fix this, but I think it warrants some > discussion on whether this is the correct thing to do, or if I need to put > the fix behind a config switch. > My idea, is that if lease recovery occurs, and the block is still waiting on > "minimal replication", just put the file back to UNDER_CONSTRUCTION so that > on the next lease recovery attempt, BLOCK RECOVERY will happen, close the > file and move the replicas to FINALIZED. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15749) Make size of editPendingQ can be configurable
[ https://issues.apache.org/jira/browse/HDFS-15749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15749: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.0, 3.4.0 (was: 3.3.0, 3.4.0, 3.2.3) > Make size of editPendingQ can be configurable > - > > Key: HDFS-15749 > URL: https://issues.apache.org/jira/browse/HDFS-15749 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Baolong Mao >Assignee: Baolong Mao >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits
[ https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16595: -- Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.4.0 (was: 3.4.0, 3.3.5) > Slow peer metrics - add median, mad and upper latency limits > > > Key: HDFS-16595 > URL: https://issues.apache.org/jira/browse/HDFS-16595 > Project: Hadoop HDFS > Issue Type: New Feature > Components: metrics >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 4h > Remaining Estimate: 0h > > Slow datanode metrics include slow node and it's reporting node details. With > HDFS-16582, we added the aggregate latency that is perceived by the reporting > nodes. > In order to get more insights into how the outlier slownode's latencies > differ from the rest of the nodes, we should also expose median, median > absolute deviation and the calculated upper latency limit details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits
[ https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16595: -- Affects Version/s: 3.3.5 3.4.0 > Slow peer metrics - add median, mad and upper latency limits > > > Key: HDFS-16595 > URL: https://issues.apache.org/jira/browse/HDFS-16595 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 4h > Remaining Estimate: 0h > > Slow datanode metrics include slow node and it's reporting node details. With > HDFS-16582, we added the aggregate latency that is perceived by the reporting > nodes. > In order to get more insights into how the outlier slownode's latencies > differ from the rest of the nodes, we should also expose median, median > absolute deviation and the calculated upper latency limit details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16595) Slow peer metrics - add median, mad and upper latency limits
[ https://issues.apache.org/jira/browse/HDFS-16595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16595: -- Component/s: metrics > Slow peer metrics - add median, mad and upper latency limits > > > Key: HDFS-16595 > URL: https://issues.apache.org/jira/browse/HDFS-16595 > Project: Hadoop HDFS > Issue Type: New Feature > Components: metrics >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 4h > Remaining Estimate: 0h > > Slow datanode metrics include slow node and it's reporting node details. With > HDFS-16582, we added the aggregate latency that is perceived by the reporting > nodes. > In order to get more insights into how the outlier slownode's latencies > differ from the rest of the nodes, we should also expose median, median > absolute deviation and the calculated upper latency limit details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer
[ https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16330: -- Component/s: datanode > Fix incorrect placeholder for Exception logs in DiskBalancer > > > Key: HDFS-16330 > URL: https://issues.apache.org/jira/browse/HDFS-16330 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer
[ https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16330: -- Affects Version/s: 3.3.2 3.4.0 > Fix incorrect placeholder for Exception logs in DiskBalancer > > > Key: HDFS-16330 > URL: https://issues.apache.org/jira/browse/HDFS-16330 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16330) Fix incorrect placeholder for Exception logs in DiskBalancer
[ https://issues.apache.org/jira/browse/HDFS-16330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16330: -- Hadoop Flags: Reviewed > Fix incorrect placeholder for Exception logs in DiskBalancer > > > Key: HDFS-16330 > URL: https://issues.apache.org/jira/browse/HDFS-16330 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17285) RBF: Add a safe mode check period configuration
[ https://issues.apache.org/jira/browse/HDFS-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17285: -- Component/s: rbf > RBF: Add a safe mode check period configuration > --- > > Key: HDFS-17285 > URL: https://issues.apache.org/jira/browse/HDFS-17285 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > When dfsrouter start, it enters safe mode. And it will cost 1min to leave. > The log is blow: > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Leave > startup safe mode after 3 ms > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: Enter > safe mode after 18 ms without reaching the State Store > 14:35:23,717 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Entering safe mode > 14:35:24,996 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Delaying safemode exit for 28721 milliseconds... > 14:36:25,037 INFO > org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService: > Leaving safe mode after 61319 milliseconds > It depends on these configs. > DFS_ROUTER_SAFEMODE_EXTENSION 30s > DFS_ROUTER_SAFEMODE_EXPIRATION 3min > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS 1min (it is the period for check safe mode) > Because in safe mode dfsrouter will reject write requests, so it should be > shorter in check period if refreshCaches is done. And we should remove > DFS_ROUTER_CACHE_TIME_TO_LIVE_MS form RouterSafemodeService. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17290: -- Component/s: metrics > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug > Components: metrics >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher priority queues when connection between client and > namenode remains open. Currently IPC server just emits a single metrics for > all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers
[ https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16323: -- Component/s: datanode > DatanodeHttpServer doesn't require handler state map while retrieving filter > handlers > - > > Key: HDFS-16323 > URL: https://issues.apache.org/jira/browse/HDFS-16323 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > DatanodeHttpServer#getFilterHandlers use handler state map just to query if > the given datanode httpserver filter handler class exists in the map and if > not, initialize the Channel handler by invoking specific parameterized > constructor of the class. However, this handler state map is never used to > upsert any data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers
[ https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16323: -- Hadoop Flags: Reviewed > DatanodeHttpServer doesn't require handler state map while retrieving filter > handlers > - > > Key: HDFS-16323 > URL: https://issues.apache.org/jira/browse/HDFS-16323 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > DatanodeHttpServer#getFilterHandlers use handler state map just to query if > the given datanode httpserver filter handler class exists in the map and if > not, initialize the Channel handler by invoking specific parameterized > constructor of the class. However, this handler state map is never used to > upsert any data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers
[ https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16323: -- Affects Version/s: 3.3.2 3.4.0 > DatanodeHttpServer doesn't require handler state map while retrieving filter > handlers > - > > Key: HDFS-16323 > URL: https://issues.apache.org/jira/browse/HDFS-16323 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > DatanodeHttpServer#getFilterHandlers use handler state map just to query if > the given datanode httpserver filter handler class exists in the map and if > not, initialize the Channel handler by invoking specific parameterized > constructor of the class. However, this handler state map is never used to > upsert any data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16255) RBF: Fix dead link to fedbalance document
[ https://issues.apache.org/jira/browse/HDFS-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16255: -- Hadoop Flags: Reviewed > RBF: Fix dead link to fedbalance document > - > > Key: HDFS-16255 > URL: https://issues.apache.org/jira/browse/HDFS-16255 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > There is a dead link in HDFSRouterFederation.md > (https://github.com/apache/hadoop/blob/e90c41af34ada9d7b61e4d5a8b88c2f62c7fea25/hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md?plain=1#L517) > {{../../../hadoop-federation-balance/HDFSFederationBalance.md}} should be > {{../../hadoop-federation-balance/HDFSFederationBalance.md}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec
[ https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16252: -- Affects Version/s: 3.3.2 3.4.0 > Correct docs for dfs.http.client.retry.policy.spec > --- > > Key: HDFS-16252 > URL: https://issues.apache.org/jira/browse/HDFS-16252 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.4.0, 3.3.2 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch > > > The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as > it has the wait time and retries switched around in the descriptio. Also, the > doc for dfs.client.retry.policy.spec is not present and should be the same as > for dfs.http.client.retry.policy.spec. > The code shows the timeout is first and then the number of retries: > {code} > String POLICY_SPEC_KEY = PREFIX + "policy.spec"; > String POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,... > // In RetryPolicies.java, we can see it gets the timeout as the first in > the pair >/** > * Parse the given string as a MultipleLinearRandomRetry object. > * The format of the string is "t_1, n_1, t_2, n_2, ...", > * where t_i and n_i are the i-th pair of sleep time and number of > retries. > * Note that the white spaces in the string are ignored. > * > * @return the parsed object, or null if the parsing fails. > */ > public static MultipleLinearRandomRetry parseCommaSeparatedString(String > s) { > final String[] elements = s.split(","); > if (elements.length == 0) { > LOG.warn("Illegal value: there is no element in \"" + s + "\"."); > return null; > } > if (elements.length % 2 != 0) { > LOG.warn("Illegal value: the number of elements in \"" + s + "\" is " > + elements.length + " but an even number of elements is > expected."); > return null; > } > final List pairs > = new ArrayList(); > > for(int i = 0; i < elements.length; ) { > //parse the i-th sleep-time > final int sleep = parsePositiveInt(elements, i++, s); > if (sleep == -1) { > return null; //parse fails > } > //parse the i-th number-of-retries > final int retries = parsePositiveInt(elements, i++, s); > if (retries == -1) { > return null; //parse fails > } > pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, > sleep)); > } > return new RetryPolicies.MultipleLinearRandomRetry(pairs); > } > {code} > This change simply updates the docs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16256) Minor fixes in HDFS Fedbalance document
[ https://issues.apache.org/jira/browse/HDFS-16256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16256: -- Hadoop Flags: Reviewed > Minor fixes in HDFS Fedbalance document > --- > > Key: HDFS-16256 > URL: https://issues.apache.org/jira/browse/HDFS-16256 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > 1. "Command submit has 4 options:" is not true. Now it has actually 6 > options. It should be updated to something like "Command submit has the > following options". > 2. > {code} > ### Configuration Options > > {code} > In the above code, the "" is not needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec
[ https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16252: -- Component/s: documentation > Correct docs for dfs.http.client.retry.policy.spec > --- > > Key: HDFS-16252 > URL: https://issues.apache.org/jira/browse/HDFS-16252 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch > > > The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as > it has the wait time and retries switched around in the descriptio. Also, the > doc for dfs.client.retry.policy.spec is not present and should be the same as > for dfs.http.client.retry.policy.spec. > The code shows the timeout is first and then the number of retries: > {code} > String POLICY_SPEC_KEY = PREFIX + "policy.spec"; > String POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,... > // In RetryPolicies.java, we can see it gets the timeout as the first in > the pair >/** > * Parse the given string as a MultipleLinearRandomRetry object. > * The format of the string is "t_1, n_1, t_2, n_2, ...", > * where t_i and n_i are the i-th pair of sleep time and number of > retries. > * Note that the white spaces in the string are ignored. > * > * @return the parsed object, or null if the parsing fails. > */ > public static MultipleLinearRandomRetry parseCommaSeparatedString(String > s) { > final String[] elements = s.split(","); > if (elements.length == 0) { > LOG.warn("Illegal value: there is no element in \"" + s + "\"."); > return null; > } > if (elements.length % 2 != 0) { > LOG.warn("Illegal value: the number of elements in \"" + s + "\" is " > + elements.length + " but an even number of elements is > expected."); > return null; > } > final List pairs > = new ArrayList(); > > for(int i = 0; i < elements.length; ) { > //parse the i-th sleep-time > final int sleep = parsePositiveInt(elements, i++, s); > if (sleep == -1) { > return null; //parse fails > } > //parse the i-th number-of-retries > final int retries = parsePositiveInt(elements, i++, s); > if (retries == -1) { > return null; //parse fails > } > pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, > sleep)); > } > return new RetryPolicies.MultipleLinearRandomRetry(pairs); > } > {code} > This change simply updates the docs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16255) RBF: Fix dead link to fedbalance document
[ https://issues.apache.org/jira/browse/HDFS-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16255: -- Affects Version/s: 3.4.0 > RBF: Fix dead link to fedbalance document > - > > Key: HDFS-16255 > URL: https://issues.apache.org/jira/browse/HDFS-16255 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > There is a dead link in HDFSRouterFederation.md > (https://github.com/apache/hadoop/blob/e90c41af34ada9d7b61e4d5a8b88c2f62c7fea25/hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md?plain=1#L517) > {{../../../hadoop-federation-balance/HDFSFederationBalance.md}} should be > {{../../hadoop-federation-balance/HDFSFederationBalance.md}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16256) Minor fixes in HDFS Fedbalance document
[ https://issues.apache.org/jira/browse/HDFS-16256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16256: -- Affects Version/s: 3.4.0 > Minor fixes in HDFS Fedbalance document > --- > > Key: HDFS-16256 > URL: https://issues.apache.org/jira/browse/HDFS-16256 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > 1. "Command submit has 4 options:" is not true. Now it has actually 6 > options. It should be updated to something like "Command submit has the > following options". > 2. > {code} > ### Configuration Options > > {code} > In the above code, the "" is not needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.
[ https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16127: -- Component/s: hdfs > Improper pipeline close recovery causes a permanent write failure or data > loss. > --- > > Key: HDFS-16127 > URL: https://issues.apache.org/jira/browse/HDFS-16127 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Major > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-16127.patch > > > When a block is being closed, the data streamer in the client waits for the > final ACK to be delivered. If an exception is received during this wait, the > close is retried. This assumption has become invalid by HDFS-15813, resulting > in permanent write failures in some close error cases involving slow nodes. > There are also less frequent cases of data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16092: -- Component/s: hdfs > Avoid creating LayoutFlags redundant objects > > > Key: HDFS-16092 > URL: https://issues.apache.org/jira/browse/HDFS-16092 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use LayoutFlags to represent features that EditLog/FSImage can support. > The utility helps write int (0) to given OutputStream and if EditLog/FSImage > supports Layout flags, they read the value from InputStream to confirm > whether there are unsupported feature flags (non zero int). However, we also > create and return new object of LayoutFlags, which is not used anywhere > because it's just a utility to read/write to/from given stream. We should > remove such redundant objects from getting created while reading from > InputStream using LayoutFlags#read utility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16092: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.2.3, 3.4.0 (was: 3.4.0, 3.2.3, 3.3.2) > Avoid creating LayoutFlags redundant objects > > > Key: HDFS-16092 > URL: https://issues.apache.org/jira/browse/HDFS-16092 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use LayoutFlags to represent features that EditLog/FSImage can support. > The utility helps write int (0) to given OutputStream and if EditLog/FSImage > supports Layout flags, they read the value from InputStream to confirm > whether there are unsupported feature flags (non zero int). However, we also > create and return new object of LayoutFlags, which is not used anywhere > because it's just a utility to read/write to/from given stream. We should > remove such redundant objects from getting created while reading from > InputStream using LayoutFlags#read utility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16092: -- Affects Version/s: 3.3.2 3.4.0 > Avoid creating LayoutFlags redundant objects > > > Key: HDFS-16092 > URL: https://issues.apache.org/jira/browse/HDFS-16092 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use LayoutFlags to represent features that EditLog/FSImage can support. > The utility helps write int (0) to given OutputStream and if EditLog/FSImage > supports Layout flags, they read the value from InputStream to confirm > whether there are unsupported feature flags (non zero int). However, we also > create and return new object of LayoutFlags, which is not used anywhere > because it's just a utility to read/write to/from given stream. We should > remove such redundant objects from getting created while reading from > InputStream using LayoutFlags#read utility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts
[ https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16090: -- Affects Version/s: 3.3.2 3.4.0 > Fine grained locking for datanodeNetworkCounts > -- > > Key: HDFS-16090 > URL: https://issues.apache.org/jira/browse/HDFS-16090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > While incrementing DataNode network error count, we lock entire LoadingCache > in order to increment network count of specific host. We should provide fine > grained concurrency for this update because locking entire cache is redundant > and could impact performance while incrementing network count for multiple > hosts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16082: -- Affects Version/s: 3.3.2 3.4.0 > Avoid non-atomic operations on exceptionsSinceLastBalance and > failedTimesSinceLastSuccessfulBalance in Balancer > --- > > Key: HDFS-16082 > URL: https://issues.apache.org/jira/browse/HDFS-16082 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Balancer has introduced 2 volatile int as part of HDFS-13783 namely: > exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. > However, we are performing non-atomic operations on it. Since non-atomic > operations done here mostly depend on their previous values, we should use > AtomicInteger for both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16082: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Avoid non-atomic operations on exceptionsSinceLastBalance and > failedTimesSinceLastSuccessfulBalance in Balancer > --- > > Key: HDFS-16082 > URL: https://issues.apache.org/jira/browse/HDFS-16082 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Balancer has introduced 2 volatile int as part of HDFS-13783 namely: > exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. > However, we are performing non-atomic operations on it. Since non-atomic > operations done here mostly depend on their previous values, we should use > AtomicInteger for both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16082: -- Component/s: balancer > Avoid non-atomic operations on exceptionsSinceLastBalance and > failedTimesSinceLastSuccessfulBalance in Balancer > --- > > Key: HDFS-16082 > URL: https://issues.apache.org/jira/browse/HDFS-16082 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Balancer has introduced 2 volatile int as part of HDFS-13783 namely: > exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. > However, we are performing non-atomic operations on it. Since non-atomic > operations done here mostly depend on their previous values, we should use > AtomicInteger for both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result
[ https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16080: -- Affects Version/s: 3.3.2 3.4.0 > RBF: Invoking method in all locations should break the loop after successful > result > --- > > Key: HDFS-16080 > URL: https://issues.apache.org/jira/browse/HDFS-16080 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > rename, delete and mkdir used by Router client usually calls multiple > locations if the path is present in multiple sub-clusters. After invoking > multiple concurrent proxy calls to multiple clients, we iterate through all > results and mark anyResult true if at least one of them was successful. We > should break the loop if one of the proxy call result was successful rather > than iterating over remaining calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts
[ https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16090: -- Component/s: datanode > Fine grained locking for datanodeNetworkCounts > -- > > Key: HDFS-16090 > URL: https://issues.apache.org/jira/browse/HDFS-16090 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > While incrementing DataNode network error count, we lock entire LoadingCache > in order to increment network count of specific host. We should provide fine > grained concurrency for this update because locking entire cache is redundant > and could impact performance while incrementing network count for multiple > hosts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15796: -- Hadoop Flags: Reviewed > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Critical > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-15796-0001.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15798: -- Hadoop Flags: Reviewed > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15798: -- Component/s: erasure-coding > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15798: -- Affects Version/s: 3.3.1 3.4.0 > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16473) Make HDFS stat tool cross platform
[ https://issues.apache.org/jira/browse/HDFS-16473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16473: -- Hadoop Flags: Reviewed > Make HDFS stat tool cross platform > -- > > Key: HDFS-16473 > URL: https://issues.apache.org/jira/browse/HDFS-16473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs++, tools >Affects Versions: 3.4.0 > Environment: Centos 7, Centos 8, Debian 10, Ubuntu Focal >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: libhdfscpp, pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The source files for *hdfs_stat* uses *getopt* for parsing the command line > arguments. getopt is available only on Linux and thus, isn't cross platform. > We need to replace getopt with *boost::program_options* to make this tool > cross platform. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16474) Make HDFS tail tool cross platform
[ https://issues.apache.org/jira/browse/HDFS-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16474: -- Hadoop Flags: Reviewed > Make HDFS tail tool cross platform > -- > > Key: HDFS-16474 > URL: https://issues.apache.org/jira/browse/HDFS-16474 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs++, tools >Affects Versions: 3.4.0 > Environment: Centos 7, Centos 8, Debian 10, Ubuntu Focal >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: libhdfscpp, pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The source files for *hdfs_tail* uses *getopt* for parsing the command line > arguments. getopt is available only on Linux and thus, isn't cross platform. > We need to replace getopt with *boost::program_options* to make these tools > cross platform. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16227) testMoverWithStripedFile fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-16227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16227: -- Component/s: test > testMoverWithStripedFile fails intermittently > - > > Key: HDFS-16227 > URL: https://issues.apache.org/jira/browse/HDFS-16227 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > TestMover#testMoverWithStripedFile fails intermittently with stacktrace: > {code:java} > [ERROR] > testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover) Time > elapsed: 48.439 s <<< FAILURE![ERROR] > testMoverWithStripedFile(org.apache.hadoop.hdfs.server.mover.TestMover) Time > elapsed: 48.439 s <<< FAILURE!java.lang.AssertionError: expected: > but was: at org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:120) at > org.junit.Assert.assertEquals(Assert.java:146) at > org.apache.hadoop.hdfs.server.mover.TestMover.testMoverWithStripedFile(TestMover.java:965) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > {code} > e.g > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3386/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16218) RBF: Use HdfsConfiguration for passing in Router principal
[ https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16218: -- Affects Version/s: 3.4.0 > RBF: Use HdfsConfiguration for passing in Router principal > -- > > Key: HDFS-16218 > URL: https://issues.apache.org/jira/browse/HDFS-16218 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 > Environment: Hadoop 3.3.0 + patches, Kerberos authentication is > enabled >Reporter: Akira Ajisaka >Assignee: Fengnan Li >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled > because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not > loaded. > {quote} > 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job > failed. > java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort > /:0. Failed on local exception: java.io.IOException: Couldn't set > up IO streams: java.lang.IllegalArgumentException: Failed to specify server's > Kerberos principal name > at > org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198) > at > org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376) > {quote} > When adding the property specifically by "-D" option, the command worked. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16219) RBF: Set default map tasks and bandwidth in RouterFederationRename
[ https://issues.apache.org/jira/browse/HDFS-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16219: -- Affects Version/s: 3.4.0 > RBF: Set default map tasks and bandwidth in RouterFederationRename > -- > > Key: HDFS-16219 > URL: https://issues.apache.org/jira/browse/HDFS-16219 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 > Environment: Hadoop 3.3.0 with patches >Reporter: Akira Ajisaka >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If dfs.federation.router.federation.rename.map or > dfs.federation.router.federation.rename.bandwidth is not set, DFSRouter fails > to launch. > This issue is similar to HDFS-16217. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16224) testBalancerWithObserverWithFailedNode times out
[ https://issues.apache.org/jira/browse/HDFS-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16224: -- Affects Version/s: 3.4.0 > testBalancerWithObserverWithFailedNode times out > > > Key: HDFS-16224 > URL: https://issues.apache.org/jira/browse/HDFS-16224 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.4.0 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > testBalancerWithObserverWithFailedNode fails intermittently. > > Seems it is because of datanode cannot shutdown because we need to wait for > datanodes to finish retries to failed observer. > > Jenkins report: > > [ERROR] > testBalancerWithObserverWithFailedNode(org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes) > Time elapsed: 180.144 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 18 > milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Thread.join(Thread.java:1252) at > java.lang.Thread.join(Thread.java:1326) at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:632) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:360) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:119) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2169) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2166) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2156) > at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2135) > at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2109) > at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:2102) > at > org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.shutdown(MiniQJMHACluster.java:189) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithObserver(TestBalancerWithHANameNodes.java:240) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes.testBalancerWithObserverWithFailedNode(TestBalancerWithHANameNodes.java:197) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16217) RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by adding appropriate config resources
[ https://issues.apache.org/jira/browse/HDFS-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16217: -- Affects Version/s: 3.4.0 > RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by > adding appropriate config resources > > > Key: HDFS-16217 > URL: https://issues.apache.org/jira/browse/HDFS-16217 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 > Environment: Hadoop 3.3.0 with patches >Reporter: Akira Ajisaka >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > When dfs.federation.router.federation.rename.option is set to DISTCP and > hdfs.fedbalance.procedure.scheduler.journal.uri is not set, DFSRouter fails > to launch. > {quote} > 2021-09-08 15:39:11,818 ERROR > org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start > router > java.lang.NullPointerException > at java.base/java.net.URI$Parser.parse(URI.java:3104) > at java.base/java.net.URI.(URI.java:600) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.initRouterFedRename(RouterRpcServer.java:444) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:419) > at > org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391) > at > org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69) > {quote} > hdfs.fedbalance.procedure.scheduler.journal.uri is > hdfs://localhost:8020/tmp/procedure by default, however, the default value is > not used in DFSRouter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16219) RBF: Set default map tasks and bandwidth in RouterFederationRename
[ https://issues.apache.org/jira/browse/HDFS-16219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16219: -- Hadoop Flags: Reviewed > RBF: Set default map tasks and bandwidth in RouterFederationRename > -- > > Key: HDFS-16219 > URL: https://issues.apache.org/jira/browse/HDFS-16219 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 > Environment: Hadoop 3.3.0 with patches >Reporter: Akira Ajisaka >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If dfs.federation.router.federation.rename.map or > dfs.federation.router.federation.rename.bandwidth is not set, DFSRouter fails > to launch. > This issue is similar to HDFS-16217. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16213) Flaky test TestFsDatasetImpl#testDnRestartWithHardLink
[ https://issues.apache.org/jira/browse/HDFS-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16213: -- Component/s: test > Flaky test TestFsDatasetImpl#testDnRestartWithHardLink > -- > > Key: HDFS-16213 > URL: https://issues.apache.org/jira/browse/HDFS-16213 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > Failure case: > [here|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3359/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > {code:java} > [ERROR] > testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl) > Time elapsed: 7.768 s <<< FAILURE![ERROR] > testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl) > Time elapsed: 7.768 s <<< FAILURE!java.lang.AssertionError at > org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16218) RBF: Use HdfsConfiguration for passing in Router principal
[ https://issues.apache.org/jira/browse/HDFS-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16218: -- Hadoop Flags: Reviewed > RBF: Use HdfsConfiguration for passing in Router principal > -- > > Key: HDFS-16218 > URL: https://issues.apache.org/jira/browse/HDFS-16218 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 > Environment: Hadoop 3.3.0 + patches, Kerberos authentication is > enabled >Reporter: Akira Ajisaka >Assignee: Fengnan Li >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > RouterFedBalance fails to connect to DFSRouter when Kerberos is enabled > because "dfs.federation.router.kerberos.principal" in hdfs-site.xml is not > loaded. > {quote} > 21/09/08 17:21:38 ERROR rbfbalance.RouterFedBalance: Submit balance job > failed. > java.io.IOException: DestHost:destPort 0.0.0.0:8111 , LocalHost:localPort > /:0. Failed on local exception: java.io.IOException: Couldn't set > up IO streams: java.lang.IllegalArgumentException: Failed to specify server's > Kerberos principal name > at > org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getMountTableEntries(RouterAdminProtocolTranslatorPB.java:198) > at > org.apache.hadoop.hdfs.rbfbalance.MountTableProcedure.getMountEntry(MountTableProcedure.java:140) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.getSrcPath(RouterFedBalance.java:326) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.access$000(RouterFedBalance.java:68) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance$Builder.build(RouterFedBalance.java:168) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.submit(RouterFedBalance.java:302) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.run(RouterFedBalance.java:216) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.hdfs.rbfbalance.RouterFedBalance.main(RouterFedBalance.java:376) > {quote} > When adding the property specifically by "-D" option, the command worked. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16213) Flaky test TestFsDatasetImpl#testDnRestartWithHardLink
[ https://issues.apache.org/jira/browse/HDFS-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16213: -- Affects Version/s: 3.4.0 > Flaky test TestFsDatasetImpl#testDnRestartWithHardLink > -- > > Key: HDFS-16213 > URL: https://issues.apache.org/jira/browse/HDFS-16213 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > Failure case: > [here|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3359/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > {code:java} > [ERROR] > testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl) > Time elapsed: 7.768 s <<< FAILURE![ERROR] > testDnRestartWithHardLink(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl) > Time elapsed: 7.768 s <<< FAILURE!java.lang.AssertionError at > org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testDnRestartWithHardLink(TestFsDatasetImpl.java:1344) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16217) RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by adding appropriate config resources
[ https://issues.apache.org/jira/browse/HDFS-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16217: -- Hadoop Flags: Reviewed > RBF: Set default value of hdfs.fedbalance.procedure.scheduler.journal.uri by > adding appropriate config resources > > > Key: HDFS-16217 > URL: https://issues.apache.org/jira/browse/HDFS-16217 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 > Environment: Hadoop 3.3.0 with patches >Reporter: Akira Ajisaka >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > When dfs.federation.router.federation.rename.option is set to DISTCP and > hdfs.fedbalance.procedure.scheduler.journal.uri is not set, DFSRouter fails > to launch. > {quote} > 2021-09-08 15:39:11,818 ERROR > org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start > router > java.lang.NullPointerException > at java.base/java.net.URI$Parser.parse(URI.java:3104) > at java.base/java.net.URI.(URI.java:600) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.initRouterFedRename(RouterRpcServer.java:444) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:419) > at > org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391) > at > org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69) > {quote} > hdfs.fedbalance.procedure.scheduler.journal.uri is > hdfs://localhost:8020/tmp/procedure by default, however, the default value is > not used in DFSRouter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15657: -- Affects Version/s: 3.3.1 3.4.0 > RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException > - > > Key: HDFS-15657 > URL: https://issues.apache.org/jira/browse/HDFS-15657 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > > Time Spent: 2h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > {noformat} > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter > [ERROR] > testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter) > Time elapsed: 1.04 s <<< ERROR! > org.apache.hadoop.service.ServiceStateException: java.net.BindException: > Problem binding to [0.0.0.0:] java.net.BindException: Address already in > use; For more details see: http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:174) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at >
[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15657: -- Hadoop Flags: Reviewed > RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException > - > > Key: HDFS-15657 > URL: https://issues.apache.org/jira/browse/HDFS-15657 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > > Time Spent: 2h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > {noformat} > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter > [ERROR] > testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter) > Time elapsed: 1.04 s <<< ERROR! > org.apache.hadoop.service.ServiceStateException: java.net.BindException: > Problem binding to [0.0.0.0:] java.net.BindException: Address already in > use; For more details see: http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:174) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at >
[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16637: -- Affects Version/s: 3.3.5 3.4.0 > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs > hdfs://localhost:51486 -printTopology] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(167)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(174)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(178)) - Comparator: > [RegexpAcrossOutputComparator] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(180)) - Comparision result: > [fail] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(182)) - Expected output: > [^Rack: > \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] > 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(184)) - Actual output: > [Rack: /rack1 > 127.0.0.1:51487 (localhost) In Service > 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 > 127.0.0.1:51500 (localhost) In Service > 127.0.0.1:51496 (localhost) In Service > 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 > 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 > 127.0.0.1:51512 (localhost) In Service > 127.0.0.1:51516 (localhost) In Service] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16634: -- Component/s: metrics > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task > Components: metrics >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16635) Fix javadoc error in Java 11
[ https://issues.apache.org/jira/browse/HDFS-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16635: -- Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.4.0 (was: 3.4.0, 3.3.5) > Fix javadoc error in Java 11 > > > Key: HDFS-16635 > URL: https://issues.apache.org/jira/browse/HDFS-16635 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, documentation >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Javadoc build in Java 11 fails. > {noformat} > [ERROR] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20: > error: reference not found > [ERROR] * This package provides a mechanism for tracking {@link NameNode} > startup > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16634: -- Affects Version/s: 3.4.0 > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16635) Fix javadoc error in Java 11
[ https://issues.apache.org/jira/browse/HDFS-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16635: -- Affects Version/s: 3.4.0 > Fix javadoc error in Java 11 > > > Key: HDFS-16635 > URL: https://issues.apache.org/jira/browse/HDFS-16635 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, documentation >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Javadoc build in Java 11 fails. > {noformat} > [ERROR] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20: > error: reference not found > [ERROR] * This package provides a mechanism for tracking {@link NameNode} > startup > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16637: -- Component/s: test > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs > hdfs://localhost:51486 -printTopology] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(167)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(174)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(178)) - Comparator: > [RegexpAcrossOutputComparator] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(180)) - Comparision result: > [fail] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(182)) - Expected output: > [^Rack: > \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] > 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(184)) - Actual output: > [Rack: /rack1 > 127.0.0.1:51487 (localhost) In Service > 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 > 127.0.0.1:51500 (localhost) In Service > 127.0.0.1:51496 (localhost) In Service > 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 > 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 > 127.0.0.1:51512 (localhost) In Service > 127.0.0.1:51516 (localhost) In Service] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16652) Upgrade jquery datatable version references to v1.10.19
[ https://issues.apache.org/jira/browse/HDFS-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16652: -- Affects Version/s: 3.4.0 > Upgrade jquery datatable version references to v1.10.19 > --- > > Key: HDFS-16652 > URL: https://issues.apache.org/jira/browse/HDFS-16652 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16652.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Upgrade jquery datatable version references in hdfs webapp to v1.10.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16618) sync_file_range error should include more volume and file info
[ https://issues.apache.org/jira/browse/HDFS-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16618: -- Affects Version/s: 3.3.5 3.4.0 > sync_file_range error should include more volume and file info > -- > > Key: HDFS-16618 > URL: https://issues.apache.org/jira/browse/HDFS-16618 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Having seen multiple sync_file_range errors recently with Bad file > descriptor, it would be good to include more volume stats as well as file > offset/length info with the error log to get some more insights. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16618) sync_file_range error should include more volume and file info
[ https://issues.apache.org/jira/browse/HDFS-16618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16618: -- Component/s: datanode > sync_file_range error should include more volume and file info > -- > > Key: HDFS-16618 > URL: https://issues.apache.org/jira/browse/HDFS-16618 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Having seen multiple sync_file_range errors recently with Bad file > descriptor, it would be good to include more volume stats as well as file > offset/length info with the error log to get some more insights. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16634: -- Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.4.0 (was: 3.4.0, 3.3.5) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task > Components: metrics >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16358) HttpFS implementation for getSnapshotDiffReportListing
[ https://issues.apache.org/jira/browse/HDFS-16358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16358: -- Affects Version/s: 3.4.0 > HttpFS implementation for getSnapshotDiffReportListing > -- > > Key: HDFS-16358 > URL: https://issues.apache.org/jira/browse/HDFS-16358 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > HttpFS should support getSnapshotDiffReportListing API for improved snapshot > diff. WebHdfs implementation available on HDFS-16091. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully
[ https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16350: -- Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.2.3, 3.4.0 (was: 3.4.0, 3.2.3, 3.3.5) > Datanode start time should be set after RPC server starts successfully > -- > > Key: HDFS-16350 > URL: https://issues.apache.org/jira/browse/HDFS-16350 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > We set start time of Datanode when the class is instantiated but it should be > ideally set only after RPC server starts and RPC handlers are initialized to > serve client requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully
[ https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16350: -- Component/s: datanode > Datanode start time should be set after RPC server starts successfully > -- > > Key: HDFS-16350 > URL: https://issues.apache.org/jira/browse/HDFS-16350 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > We set start time of Datanode when the class is instantiated but it should be > ideally set only after RPC server starts and RPC handlers are initialized to > serve client requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16358) HttpFS implementation for getSnapshotDiffReportListing
[ https://issues.apache.org/jira/browse/HDFS-16358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16358: -- Component/s: httpfs > HttpFS implementation for getSnapshotDiffReportListing > -- > > Key: HDFS-16358 > URL: https://issues.apache.org/jira/browse/HDFS-16358 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > HttpFS should support getSnapshotDiffReportListing API for improved snapshot > diff. WebHdfs implementation available on HDFS-16091. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully
[ https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16350: -- Affects Version/s: 3.3.2 3.4.0 > Datanode start time should be set after RPC server starts successfully > -- > > Key: HDFS-16350 > URL: https://issues.apache.org/jira/browse/HDFS-16350 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > We set start time of Datanode when the class is instantiated but it should be > ideally set only after RPC server starts and RPC handlers are initialized to > serve client requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16336) De-flake TestRollingUpgrade#testRollback
[ https://issues.apache.org/jira/browse/HDFS-16336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16336: -- Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.2.3, 3.4.0 (was: 3.4.0, 3.2.3, 3.3.5) > De-flake TestRollingUpgrade#testRollback > > > Key: HDFS-16336 > URL: https://issues.apache.org/jira/browse/HDFS-16336 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, test >Affects Versions: 3.4.0 >Reporter: Kevin Wikant >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This pull request: [https://github.com/apache/hadoop/pull/3675] > Failed Jenkins pre-commit job due to an unrelated unit test failure: > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3675/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > {code:java} > [ERROR] Failures: > [ERROR] > org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade) > [ERROR] Run 1: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 > expected null, but > was: createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> > [ERROR] Run 2: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 > expected null, but > was: createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> > [ERROR] Run 3: TestRollingUpgrade.testRollback:328->checkMxBeanIsNull:299 > expected null, but > was: createdRollbackImages=true, finalizeTime=0, startTime=1637204448659})> {code} > Seems that perhaps "TestRollingUpgrade.testRollback" is a flaky unit test -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16171) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16171: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.2.3, 2.10.2, 3.4.0 (was: 3.4.0, 2.10.2, 3.2.3, 3.3.2) > De-flake testDecommissionStatus > --- > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16184) De-flake TestBlockScanner#testSkipRecentAccessFile
[ https://issues.apache.org/jira/browse/HDFS-16184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16184: -- Affects Version/s: 3.3.2 3.4.0 > De-flake TestBlockScanner#testSkipRecentAccessFile > -- > > Key: HDFS-16184 > URL: https://issues.apache.org/jira/browse/HDFS-16184 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Test TestBlockScanner#testSkipRecentAccessFile is flaky: > > {code:java} > [ERROR] > testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner) > Time elapsed: 3.936 s <<< FAILURE![ERROR] > testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner) > Time elapsed: 3.936 s <<< FAILURE!java.lang.AssertionError: Scan nothing > for all files are accessed in last period. at > org.junit.Assert.fail(Assert.java:89) at > org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testSkipRecentAccessFile(TestBlockScanner.java:1015) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > e.g > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3235/37/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.
[ https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16157: -- Affects Version/s: 3.4.0 > Support configuring DNS record to get list of journal nodes. > > > Key: HDFS-16157 > URL: https://issues.apache.org/jira/browse/HDFS-16157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of journal nodes, so we > don't have to reconfigure everything journal node hostname is changed. For > example, in some containerized environment the hostname of journal nodes can > change pretty often. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)
[ https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16144: -- Affects Version/s: 3.3.2 3.4.0 > Revert HDFS-15372 (Files in snapshots no longer see attribute provider > permissions) > --- > > Key: HDFS-16144 > URL: https://issues.apache.org/jira/browse/HDFS-16144 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.2 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, > HDFS-16144.003.patch, HDFS-16144.004.patch > > > In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. > When a user accesses a file in a snapshot, if an attribute provider is > configured it would see the original file path (ie no .snapshot folder) in > Hadoop 2, but it would see the snapshot path in Hadoop 3. > HDFS-15372 changed this back, but I noted at the time it may make sense for > the provider to see the actual snapshot path instead. > Recently we discovered HDFS-16132 where the HDFS-15372 does not work > correctly. At this stage I believe it is better to revert HDFS-15372 as the > fix to this issue is probably not trivial and allow providers to see the > actual path the user accessed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org