[jira] [Created] (HDFS-16121) Iterative snapshot diff report can generate duplicate records for creates and deletes

2021-07-07 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDFS-16121:
--

 Summary: Iterative snapshot diff report can generate duplicate 
records for creates and deletes
 Key: HDFS-16121
 URL: https://issues.apache.org/jira/browse/HDFS-16121
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Srinivasu Majeti
Assignee: Shashikant Banerjee


Currently, iterative snapshot diff report first traverses the created list for 
a directory diff and then the deleted list. If the deleted list size is lesser 
than the created list size, the offset calculation in the respective list seems 
wrong. So the next iteration of diff report generation call, it will start 
iterating the already processed in the created list leading to duplicate 
entries in the list.

Fix is to correct the offset calculation during the traversal of the deleted 
list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=620347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620347
 ]

ASF GitHub Bot logged work on HDFS-16087:
-

Author: ASF GitHub Bot
Created on: 08/Jul/21 05:38
Start Date: 08/Jul/21 05:38
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3141:
URL: https://github.com/apache/hadoop/pull/3141#issuecomment-876143175


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m  5s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  20m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  14m 51s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 10s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   3m 42s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/6/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 14 new + 1 unchanged - 0 fixed = 15 total (was 1) 
 |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   6m 51s |  |  hadoop-federation-balance in 
the patch passed.  |
   | +1 :green_heart: |  unit  |  22m  2s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  0s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 200m  0s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3141/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3141 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux fae163c0d76b 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2415748274dd38a0e321c627d1c99d269cbef44c |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 

[jira] [Updated] (HDFS-16120) NNThroughputBenchmark does not filter generic hadoop args

2021-07-07 Thread xuefeng zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuefeng zhao updated HDFS-16120:

Affects Version/s: 3.3.1

> NNThroughputBenchmark   does not filter  generic hadoop args
> 
>
> Key: HDFS-16120
> URL: https://issues.apache.org/jira/browse/HDFS-16120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 2.7.5, 3.3.1
>Reporter: xuefeng zhao
>Priority: Minor
>
> I  notice  'Convert NNThroughputBenchmark to a Tool to allow generic 
> options'(https://issues.apache.org/jira/browse/HDFS-5068) allow generic 
> options in  NNThroughputBenchmark  
> But when I run NNThroughputBenchmark with generic options  ,it not works.
> Always print Usage: NNThroughputBenchmark
> I check the source code  of NNThroughputBenchmark, find that, it does not 
> support genericOptions now. 
> main->runBenchmark->ToolRunner.run->run
> {code:java}
> GenericOptionsParser parser = new GenericOptionsParser(conf, args);
> //set the configuration back, so that Tool can configure itself
> tool.setConf(conf);
> //get the args w/o generic hadoop args
> String[] toolArgs = parser.getRemainingArgs();
> return tool.run(toolArgs);
> {code}
> Although `get the args w/o generic hadoop args`, parse still has generic 
> hadoop args.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16120) NNThroughputBenchmark does not filter generic hadoop args

2021-07-07 Thread xuefeng zhao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377051#comment-17377051
 ] 

xuefeng zhao commented on HDFS-16120:
-

when debug into   
*org.apache.commons.cli.Parser#parse(org.apache.commons.cli.Options, 
java.lang.String[], java.util.Properties, boolean)*

,this method does not change commdandline's options field 

> NNThroughputBenchmark   does not filter  generic hadoop args
> 
>
> Key: HDFS-16120
> URL: https://issues.apache.org/jira/browse/HDFS-16120
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks
>Affects Versions: 2.7.5
>Reporter: xuefeng zhao
>Priority: Minor
>
> I  notice  'Convert NNThroughputBenchmark to a Tool to allow generic 
> options'(https://issues.apache.org/jira/browse/HDFS-5068) allow generic 
> options in  NNThroughputBenchmark  
> But when I run NNThroughputBenchmark with generic options  ,it not works.
> Always print Usage: NNThroughputBenchmark
> I check the source code  of NNThroughputBenchmark, find that, it does not 
> support genericOptions now. 
> main->runBenchmark->ToolRunner.run->run
> {code:java}
> GenericOptionsParser parser = new GenericOptionsParser(conf, args);
> //set the configuration back, so that Tool can configure itself
> tool.setConf(conf);
> //get the args w/o generic hadoop args
> String[] toolArgs = parser.getRemainingArgs();
> return tool.run(toolArgs);
> {code}
> Although `get the args w/o generic hadoop args`, parse still has generic 
> hadoop args.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16120) NNThroughputBenchmark does not filter generic hadoop args

2021-07-07 Thread xuefeng zhao (Jira)
xuefeng zhao created HDFS-16120:
---

 Summary: NNThroughputBenchmark   does not filter  generic hadoop 
args
 Key: HDFS-16120
 URL: https://issues.apache.org/jira/browse/HDFS-16120
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 2.7.5
Reporter: xuefeng zhao


I  notice  'Convert NNThroughputBenchmark to a Tool to allow generic 
options'(https://issues.apache.org/jira/browse/HDFS-5068) allow generic options 
in  NNThroughputBenchmark  
But when I run NNThroughputBenchmark with generic options  ,it not works.
Always print Usage: NNThroughputBenchmark

I check the source code  of NNThroughputBenchmark, find that, it does not 
support genericOptions now. 
main->runBenchmark->ToolRunner.run->run


{code:java}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);

//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
{code}
Although `get the args w/o generic hadoop args`, parse still has generic hadoop 
args.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16118?focusedWorklogId=620306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620306
 ]

ASF GitHub Bot logged work on HDFS-16118:
-

Author: ASF GitHub Bot
Created on: 08/Jul/21 02:05
Start Date: 08/Jul/21 02:05
Worklog Time Spent: 10m 
  Work Description: jianghuazhu commented on pull request #3186:
URL: https://github.com/apache/hadoop/pull/3186#issuecomment-876061665


   Some unit tests failed and have nothing to do with this function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 620306)
Time Spent: 0.5h  (was: 20m)

> Improve the number of handlers that initialize 
> NameNodeRpcServer#clientRpcServer
> 
>
> Key: HDFS-16118
> URL: https://issues.apache.org/jira/browse/HDFS-16118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When initializing NameNodeRpcServer, if the value of 
> dfs.namenode.lifeline.handler.count is set to be less than 0 (such as -1, of 
> course, this is rare), when determining the number of lifeline RPC handlers, 
> it will be based on dfs.namenode.handler .count * lifelineHandlerRatio is 
> determined.
> The code can be found:
> int lifelineHandlerCount = conf.getInt(
>DFS_NAMENODE_LIFELINE_HANDLER_COUNT_KEY, 0);
>if (lifelineHandlerCount <= 0) {
>  float lifelineHandlerRatio = conf.getFloat(
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_KEY,
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_DEFAULT);
>  lifelineHandlerCount = Math.max(
>  (int)(handlerCount * lifelineHandlerRatio), 1);
>}
> When this happens, the handlerCount should be subtracted from the 
> lifelineHandlerCount when in fact it doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma reassigned HDFS-16115:


Assignee: Daniel Ma

> Asynchronously handle BPServiceActor command mechanism may result in 
> BPServiceActor never fails even CommandProcessingThread is closed with fatal 
> error.
> 
>
> Key: HDFS-16115
> URL: https://issues.apache.org/jira/browse/HDFS-16115
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16115.patch
>
>
> It is an improvement issue. Actually the issue has two sub issues:
> 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( 
> CommandProcessThread handle commands ), so if there are any exceptions or 
> errors happen in thread CommandProcessthread resulting the thread fails and 
> stop, of which BPServiceActor cannot aware and still keep putting commands 
> from namenode into queues waiting to be handled by CommandProcessThread, 
> actually CommandProcessThread was dead already.
> 2-the second sub issue is based on the first one, if CommandProcessThread was 
> dead owing to some non-fatal errors like "can not create native thread" which 
> is caused by too many threads existed in OS, this kind of problem should be 
> given much more torlerance instead of simply shudown the thread and never 
> recover automatically, because the non-fatal errors mentioned above probably 
> can be recovered soon by itself,
> {code:java}
> //代码占位符
> 2021-07-02 16:26:02,315 | WARN  | Command processor | Exception happened when 
> process queue BPServiceActor.java:1393
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365)
> {code}
> currently, Datanode BPServiceActor cannot turn to normal even when the 
> non-fatal error was eliminated.
> Therefore, in this patch, two things will be done:
> 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread 
> thread which is 5 by default and configurable;
> 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor 
> thread is dead owing to  too many times non-fatal error, it should not be 
> simply removed from BPServviceActor lists stored in BPOfferService, instead, 
> the monitor thread will periodically try to start these special dead 
> BPServiceActor thread. the interval is also configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=620303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620303
 ]

ASF GitHub Bot logged work on HDFS-16087:
-

Author: ASF GitHub Bot
Created on: 08/Jul/21 01:45
Start Date: 08/Jul/21 01:45
Worklog Time Spent: 10m 
  Work Description: lipp commented on a change in pull request #3141:
URL: https://github.com/apache/hadoop/pull/3141#discussion_r665811829



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/rbfbalance/TestRouterDistCpProcedure.java
##
@@ -0,0 +1,120 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.rbfbalance;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.ha.HAServiceProtocol;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster;
+import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder;
+import org.apache.hadoop.hdfs.server.federation.StateStoreDFSCluster;
+import 
org.apache.hadoop.hdfs.server.federation.resolver.ActiveNamenodeResolver;
+import org.apache.hadoop.hdfs.server.federation.resolver.MountTableManager;
+import org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys;
+import org.apache.hadoop.hdfs.server.federation.router.Router;
+import org.apache.hadoop.hdfs.server.federation.store.StateStoreService;
+import org.apache.hadoop.hdfs.server.federation.store.impl.MountTableStoreImpl;
+import 
org.apache.hadoop.hdfs.server.federation.store.protocol.AddMountTableEntryRequest;
+import 
org.apache.hadoop.hdfs.server.federation.store.protocol.AddMountTableEntryResponse;
+import org.apache.hadoop.hdfs.server.federation.store.records.MountTable;
+import org.apache.hadoop.ipc.RemoteException;
+import org.apache.hadoop.tools.fedbalance.DistCpProcedure.Stage;
+import org.apache.hadoop.tools.fedbalance.FedBalanceContext;
+import org.apache.hadoop.tools.fedbalance.TestDistCpProcedure;
+import org.apache.hadoop.util.Time;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Collections;
+
+import static 
org.apache.hadoop.hdfs.server.federation.FederationTestUtils.createNamenodeReport;
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+import static org.junit.Assert.assertTrue;
+
+
+public class TestRouterDistCpProcedure extends TestDistCpProcedure {
+private static StateStoreDFSCluster cluster;
+private static MiniRouterDFSCluster.RouterContext routerContext;
+private static Configuration routerConf;
+private static StateStoreService stateStore;
+
+@BeforeClass
+public static void globalSetUp() throws Exception {
+cluster = new StateStoreDFSCluster(false, 1);
+// Build and start a router with State Store + admin + RPC
+Configuration conf = new RouterConfigBuilder()
+.stateStore()
+.admin()
+.rpc()
+.build();
+cluster.addRouterOverrides(conf);
+cluster.startRouters();
+routerContext = cluster.getRandomRouter();
+Router router = routerContext.getRouter();
+stateStore = router.getStateStore();
+
+// Add one name services for testing
+ActiveNamenodeResolver membership = router.getNamenodeResolver();
+membership.registerNamenode(createNamenodeReport("ns0", "nn1",
+HAServiceProtocol.HAServiceState.ACTIVE));
+stateStore.refreshCaches(true);
+
+routerConf = new Configuration();
+InetSocketAddress routerSocket = router.getAdminServerAddress();
+routerConf.setSocketAddr(RBFConfigKeys.DFS_ROUTER_ADMIN_ADDRESS_KEY,
+routerSocket);
+}
+
+@Override
+public void testDisableWrite() throws Exception {
+// Firstly add mount entry: /test-write->{ns0,/test-write}.
+String mount = "/test-write";
+MountTable newEntry = MountTable
+

[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=620302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620302
 ]

ASF GitHub Bot logged work on HDFS-16087:
-

Author: ASF GitHub Bot
Created on: 08/Jul/21 01:35
Start Date: 08/Jul/21 01:35
Worklog Time Spent: 10m 
  Work Description: lipp commented on pull request #3141:
URL: https://github.com/apache/hadoop/pull/3141#issuecomment-876049463


   Thanks @wojiaodoubao for your review and suggestions. I will fix them soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 620302)
Time Spent: 2h 10m  (was: 2h)

> RBF balance process is stuck at DisableWrite stage
> --
>
> Key: HDFS-16087
> URL: https://issues.apache.org/jira/browse/HDFS-16087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Eric Yin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The balance process will be stuck at DisableWrite stage when running the 
> rbfbalance command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16117) Add file count info in audit log to record the file count for delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-07 Thread Daniel Ma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376938#comment-17376938
 ] 

Daniel Ma commented on HDFS-16117:
--

Hello ,[~sodonnell]

Could you pls help to review this patch?

> Add file count info in audit log to record the file count for delete and 
> getListing RPC request to assist user trouble shooting when RPC cost is 
> increasing 
> 
>
> Key: HDFS-16117
> URL: https://issues.apache.org/jira/browse/HDFS-16117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16117.patch
>
>
> Currently, there is no file count in audit log for delete and getListing RPC 
> request, therefore, for the increasing RPC call, it is not easy to configure 
> it out whether the time-consuming RPC  request is related to too many files 
> be operated in the RPC request.
>  
> Therefore, It it necessary to add file count info in the audit log to assist 
> maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16088) Standby NameNode process getLiveDatanodeStorageReport request to reduce Active load

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16088?focusedWorklogId=620301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620301
 ]

ASF GitHub Bot logged work on HDFS-16088:
-

Author: ASF GitHub Bot
Created on: 08/Jul/21 01:30
Start Date: 08/Jul/21 01:30
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3140:
URL: https://github.com/apache/hadoop/pull/3140#issuecomment-876047593


   Hi @tasanuma @jojochuang @aajisaka @ayushtkn , could you please review the 
code? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 620301)
Time Spent: 4h  (was: 3h 50m)

> Standby NameNode process getLiveDatanodeStorageReport request to reduce 
> Active load
> ---
>
> Key: HDFS-16088
> URL: https://issues.apache.org/jira/browse/HDFS-16088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: standyby-ipcserver.jpg
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> As with HDFS-13183, NameNodeConnector#getLiveDatanodeStorageReport() can also 
> request to SNN to reduce the ANN load.
> There are two points that need to be mentioned:
>  1. FSNamesystem#getDatanodeStorageReport() is OperationCategory.UNCHECKED, 
> so we can access SNN directly.
>  2. We can share the same UT(testBalancerRequestSBNWithHA) with 
> NameNodeConnector#getBlocks().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16117) Add file count info in audit log to record the file count for delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma reassigned HDFS-16117:


Assignee: Daniel Ma

> Add file count info in audit log to record the file count for delete and 
> getListing RPC request to assist user trouble shooting when RPC cost is 
> increasing 
> 
>
> Key: HDFS-16117
> URL: https://issues.apache.org/jira/browse/HDFS-16117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16117.patch
>
>
> Currently, there is no file count in audit log for delete and getListing RPC 
> request, therefore, for the increasing RPC call, it is not easy to configure 
> it out whether the time-consuming RPC  request is related to too many files 
> be operated in the RPC request.
>  
> Therefore, It it necessary to add file count info in the audit log to assist 
> maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16093) DataNodes under decommission will still be returned to the client via getLocatedBlocks, so the client may request decommissioning datanodes to read which will cause badl

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma reassigned HDFS-16093:


Assignee: Daniel Ma

> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> --
>
> Key: HDFS-16093
> URL: https://issues.apache.org/jira/browse/HDFS-16093
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
>
> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> Therefore, datanodes under decommission should be removed from the return 
> list of getLocatedBlocks api.
> !image-2021-06-29-10-50-44-739.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16118?focusedWorklogId=620213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620213
 ]

ASF GitHub Bot logged work on HDFS-16118:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 20:55
Start Date: 07/Jul/21 20:55
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3186:
URL: https://github.com/apache/hadoop/pull/3186#issuecomment-875926006


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  16m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 42s |  |  Maven dependency ordering for branch  |
   | -1 :x: |  mvninstall  |   6m  0s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3186/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |  21m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m 45s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 16s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  9s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 40s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 10s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   6m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  3s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 425m 25s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3186/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 12s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 629m 48s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus 
|
   |   | hadoop.hdfs.TestViewDistributedFileSystemContract |
   |   | hadoop.hdfs.TestSnapshotCommands |
   |   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
   |   | hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand |
   |   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3186/1/artifact/out/Dockerfile
 |
   | GITHUB PR | 

[jira] [Commented] (HDFS-16100) HA: Improve performance of Standby node transition to Active

2021-07-07 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376824#comment-17376824
 ] 

Íñigo Goiri commented on HDFS-16100:


This is a pretty sensitive part of the code.
We need to make sure we have tests covering this.

In terms of readability, I think we may want to use the complement of the if:
{code}
if (reportedState != ReplicaState.RBW ||
storedBlock.getGenerationStamp() <= block.getGenerationStamp()) {
{code}
In addition, we may want to have comments explaining the reasoning.

>  HA: Improve performance of Standby node transition to Active
> -
>
> Key: HDFS-16100
> URL: https://issues.apache.org/jira/browse/HDFS-16100
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: wudeyu
>Assignee: wudeyu
>Priority: Major
> Attachments: HDFS-16100.001.patch, HDFS-16100.patch
>
>
> pendingDNMessages in Standby is used to support process postponed block 
> reports. Block reports in pendingDNMessages would be processed:
>  # If GS of replica is in the future, Standby Node will process it when 
> corresponding edit log(e.g add_block) is loaded.
>  # If replica is corrupted, Standby Node will process it while it transfer to 
> Active.
>  # If DataNode is removed, corresponding of block reports will be removed in 
> pendingDNMessages.
> Obviously, if num of corrupted replica grows, more time cost during 
> transferring. In out situation, there're 60 millions block reports in 
> pendingDNMessages before transfer. Processing block reports cost almost 7mins 
> and it's killed by zkfc. The replica state of the most block reports is RBW 
> with wrong GS(less than storedblock in Standby Node).
> In my opinion, Standby Node could ignore the block reports that replica state 
> is RBW with wrong GS. Because Active node/DataNode will remove it later.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13092) Reduce verbosity for ThrottledAsyncChecker.java:schedule

2021-07-07 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376702#comment-17376702
 ] 

Ahmed Hussein commented on HDFS-13092:
--

Hi [~msingh], thanks for providing the patch.
We observed the same behavior on our internal cluster running branch-2.10.
The frequency of triggering those checks is a little bit concerning. Do you 
have a clue what would trigger the check on a single volume that frequently?


> Reduce verbosity for ThrottledAsyncChecker.java:schedule
> 
>
> Key: HDFS-13092
> URL: https://issues.apache.org/jira/browse/HDFS-13092
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Minor
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-13092.001.patch
>
>
> ThrottledAsyncChecker.java:schedule prints a log message every time a disk 
> check is scheduled. However if the previous check was triggered lesser than 
> the frequency at "minMsBetweenChecks" then the task is not scheduled. This 
> jira will reduce the log verbosity by printing the message only when the task 
> will be scheduled.
> {code}
> 2018-01-29 00:51:44,467 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/2/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,470 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/2/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,477 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/4/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,480 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/4/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,486 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/11/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,501 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/13/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,507 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/11/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,533 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/2/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,536 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/12/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,543 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/10/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,544 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/2/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,548 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/3/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,549 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/5/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,550 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/6/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,551 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/2/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,552 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/10/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,552 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/8/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,552 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/12/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,554 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/9/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,555 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/8/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,555 INFO  checker.ThrottledAsyncChecker 
> (ThrottledAsyncChecker.java:schedule(107)) - Scheduling a check for 
> /grid/14/hadoop/hdfs/data/current
> 2018-01-29 00:51:44,560 

[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376667#comment-17376667
 ] 

Stephen O'Donnell commented on HDFS-15796:
--

Thanks [~Daniel Ma] - the latest patch looks better. I have submitted it to see 
how it does in the CI run. I also added you to the contributors group so you 
can assigned issues to yourself in the future.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-15796:


Assignee: Daniel Ma

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15796:
-
Target Version/s: 3.4.0  (was: 3.3.1)
  Status: Patch Available  (was: Open)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?focusedWorklogId=620039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-620039
 ]

ASF GitHub Bot logged work on HDFS-16119:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 15:33
Start Date: 07/Jul/21 15:33
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3185:
URL: https://github.com/apache/hadoop/pull/3185#issuecomment-875705835


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 37s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  9s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 57s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3185/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 489 unchanged 
- 0 fixed = 490 total (was 489)  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  8s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 230m 48s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 326m  0s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3185/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3185 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux d8cfccf16cd6 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 310a266770a55f8d86bbb9f310077360f59b682a |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 

[jira] [Comment Edited] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376586#comment-17376586
 ] 

Daniel Ma edited comment on HDFS-15796 at 7/7/21, 1:52 PM:
---

[~sodonnell]

Yes, it is more elegant.(y)


was (Author: daniel ma):
[~sodonnell]

Yes, it is more elegant.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-15796:
-
Attachment: HDFS-15796-0001.patch

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-15796:
-
Attachment: (was: 0002-HDFS-15796.patch)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-15796:
-
Attachment: (was: 0003-HDFS-15796.patch)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-15796:
-
Attachment: (was: 0001-HDFS-15796.patch)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376586#comment-17376586
 ] 

Daniel Ma commented on HDFS-15796:
--

[~sodonnell]

Yes, it is more elegant.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch, 0002-HDFS-15796.patch, 
> 0003-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-15796:
-
Attachment: 0003-HDFS-15796.patch

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch, 0002-HDFS-15796.patch, 
> 0003-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.

2021-07-07 Thread Daniel Ma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376571#comment-17376571
 ] 

Daniel Ma commented on HDFS-16115:
--

hello [~hexiaoqiao].

Thanks for review, 

1-For non-fatal error, I define two kinds at present:

These two errors are caused by too many threads in OS and too many open files 
in OS, which probably can recover soon.

Even if the OS limit cannot be recovered proactively, users expect that the 
HDFS can recover automatically after manual access.

 
{code:java}
//代码占位符
enum NON_FATAL_TYPES {
  THREAD_EXCEED("unable to create new native thread"),
  FILE_EXCEED("Too many open files");

  private final String errorMsg;

  NON_FATAL_TYPES(String errorMsg){
this.errorMsg = errorMsg;
  }

  public String getErrorMsg() {
return errorMsg;
  }
}
{code}
2-the main defect of HDFS-15651 is that BPServiceActor thread will never back 
to normal unless restart DataNode which is always unacceptable in production 
environment reported by real users. That is why I develop this feature. 

 

> Asynchronously handle BPServiceActor command mechanism may result in 
> BPServiceActor never fails even CommandProcessingThread is closed with fatal 
> error.
> 
>
> Key: HDFS-16115
> URL: https://issues.apache.org/jira/browse/HDFS-16115
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Priority: Critical
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16115.patch
>
>
> It is an improvement issue. Actually the issue has two sub issues:
> 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( 
> CommandProcessThread handle commands ), so if there are any exceptions or 
> errors happen in thread CommandProcessthread resulting the thread fails and 
> stop, of which BPServiceActor cannot aware and still keep putting commands 
> from namenode into queues waiting to be handled by CommandProcessThread, 
> actually CommandProcessThread was dead already.
> 2-the second sub issue is based on the first one, if CommandProcessThread was 
> dead owing to some non-fatal errors like "can not create native thread" which 
> is caused by too many threads existed in OS, this kind of problem should be 
> given much more torlerance instead of simply shudown the thread and never 
> recover automatically, because the non-fatal errors mentioned above probably 
> can be recovered soon by itself,
> {code:java}
> //代码占位符
> 2021-07-02 16:26:02,315 | WARN  | Command processor | Exception happened when 
> process queue BPServiceActor.java:1393
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365)
> {code}
> currently, Datanode BPServiceActor cannot turn to normal even when the 
> non-fatal error was eliminated.
> Therefore, in this patch, two things will be done:
> 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread 
> thread which is 5 by default and configurable;
> 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor 
> thread is dead owing to  too many times non-fatal error, it should 

[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376568#comment-17376568
 ] 

Stephen O'Donnell commented on HDFS-15796:
--

For the latest patch, why not just use `return new ArrayList<>(found.targets);` 
rather than Collections.copy and another local variable? That will create a 
return a new list with the contents of the original in a single line, rather 
than 3 lines like here?

Also, you don't seem to have initialized targets, and I don't think 
Collections.copy will work with a null target:

{code}
   List getTargets(BlockInfo block) {
 synchronized (pendingReconstructions) {
+  List targets = null;
   PendingBlockInfo found = pendingReconstructions.get(block);
   if (found != null) {
-return found.targets;
+Collections.copy(targets, found.targets);
+return targets;
   }
{code}

Can you also name the patch files like HDFS-15796.001.patch. I am not sure if 
the system will handle the naming format you have used correctly.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch, 0002-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16115) Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error.

2021-07-07 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376556#comment-17376556
 ] 

Xiaoqiao He commented on HDFS-16115:


I agree with you that HDFS-15651 is rough solution for this case, but it is 
safe totally.
About your proposal, I am concerned that
A. how to define 'a non-fatal error'? which type exception/error should we 
retry or for every exception/error? 
B. when we meet 'a non-fatal error' which should retry, is this action should 
fall back to command queue? if yes, how to keep every command action is 
idempotent, if not any command could miss?
NOTE, sorry I don't review patch carefully. Please correct me f something I 
missed. Thanks.

> Asynchronously handle BPServiceActor command mechanism may result in 
> BPServiceActor never fails even CommandProcessingThread is closed with fatal 
> error.
> 
>
> Key: HDFS-16115
> URL: https://issues.apache.org/jira/browse/HDFS-16115
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Priority: Critical
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16115.patch
>
>
> It is an improvement issue. Actually the issue has two sub issues:
> 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( 
> CommandProcessThread handle commands ), so if there are any exceptions or 
> errors happen in thread CommandProcessthread resulting the thread fails and 
> stop, of which BPServiceActor cannot aware and still keep putting commands 
> from namenode into queues waiting to be handled by CommandProcessThread, 
> actually CommandProcessThread was dead already.
> 2-the second sub issue is based on the first one, if CommandProcessThread was 
> dead owing to some non-fatal errors like "can not create native thread" which 
> is caused by too many threads existed in OS, this kind of problem should be 
> given much more torlerance instead of simply shudown the thread and never 
> recover automatically, because the non-fatal errors mentioned above probably 
> can be recovered soon by itself,
> {code:java}
> //代码占位符
> 2021-07-02 16:26:02,315 | WARN  | Command processor | Exception happened when 
> process queue BPServiceActor.java:1393
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
> at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:180)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:229)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2315)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2237)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:752)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:698)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1417)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1463)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1382)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1365)
> {code}
> currently, Datanode BPServiceActor cannot turn to normal even when the 
> non-fatal error was eliminated.
> Therefore, in this patch, two things will be done:
> 1-Add retry mechanism in BPServiceActor thread and CommandProcessThread 
> thread which is 5 by default and configurable;
> 2-Add a monitor periodical thread in BPOfferService, if a BPServiceActor 
> thread is dead owing to  too many times non-fatal error, it should not be 
> simply removed from BPServviceActor lists stored in BPOfferService, instead, 
> the monitor thread will periodically try to start these special dead 
> BPServiceActor thread. the interval is also configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619941
 ]

ASF GitHub Bot logged work on HDFS-16087:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 12:19
Start Date: 07/Jul/21 12:19
Worklog Time Spent: 10m 
  Work Description: wojiaodoubao commented on pull request #3141:
URL: https://github.com/apache/hadoop/pull/3141#issuecomment-87847


   There is also some check-style complaint from yetus, fix them please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619941)
Time Spent: 2h  (was: 1h 50m)

> RBF balance process is stuck at DisableWrite stage
> --
>
> Key: HDFS-16087
> URL: https://issues.apache.org/jira/browse/HDFS-16087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Eric Yin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The balance process will be stuck at DisableWrite stage when running the 
> rbfbalance command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619936
 ]

ASF GitHub Bot logged work on HDFS-16087:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 12:15
Start Date: 07/Jul/21 12:15
Worklog Time Spent: 10m 
  Work Description: wojiaodoubao commented on a change in pull request 
#3141:
URL: https://github.com/apache/hadoop/pull/3141#discussion_r661299048



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/rbfbalance/RouterDistCpProcedure.java
##
@@ -44,6 +44,7 @@ protected void disableWrite(FedBalanceContext context) throws 
IOException {
 Configuration conf = context.getConf();
 String mount = context.getMount();
 MountTableProcedure.disableWrite(mount, conf);
+updateStage(Stage.FINAL_DISTCP);

Review comment:
   Thanks @lipp for your nice report ! The change is correct. The 
DISABLE_WRITE is a stage of the DistCpProcedure. In DistCpProcedure it disables 
write by cancel the permission. The RouterDistCpProcedure extends 
DistCpProcedure and disables write by set mount point read only. But 
RouterDistCpProcedure forgets to update the stage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619936)
Time Spent: 1h 50m  (was: 1h 40m)

> RBF balance process is stuck at DisableWrite stage
> --
>
> Key: HDFS-16087
> URL: https://issues.apache.org/jira/browse/HDFS-16087
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Eric Yin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The balance process will be stuck at DisableWrite stage when running the 
> rbfbalance command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16087) RBF balance process is stuck at DisableWrite stage

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16087?focusedWorklogId=619935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619935
 ]

ASF GitHub Bot logged work on HDFS-16087:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 12:14
Start Date: 07/Jul/21 12:14
Worklog Time Spent: 10m 
  Work Description: wojiaodoubao commented on a change in pull request 
#3141:
URL: https://github.com/apache/hadoop/pull/3141#discussion_r665311980



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/rbfbalance/TestRouterDistCpProcedure.java
##
@@ -0,0 +1,120 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.rbfbalance;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.ha.HAServiceProtocol;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster;
+import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder;
+import org.apache.hadoop.hdfs.server.federation.StateStoreDFSCluster;
+import 
org.apache.hadoop.hdfs.server.federation.resolver.ActiveNamenodeResolver;
+import org.apache.hadoop.hdfs.server.federation.resolver.MountTableManager;
+import org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys;
+import org.apache.hadoop.hdfs.server.federation.router.Router;
+import org.apache.hadoop.hdfs.server.federation.store.StateStoreService;
+import org.apache.hadoop.hdfs.server.federation.store.impl.MountTableStoreImpl;
+import 
org.apache.hadoop.hdfs.server.federation.store.protocol.AddMountTableEntryRequest;
+import 
org.apache.hadoop.hdfs.server.federation.store.protocol.AddMountTableEntryResponse;
+import org.apache.hadoop.hdfs.server.federation.store.records.MountTable;
+import org.apache.hadoop.ipc.RemoteException;
+import org.apache.hadoop.tools.fedbalance.DistCpProcedure.Stage;
+import org.apache.hadoop.tools.fedbalance.FedBalanceContext;
+import org.apache.hadoop.tools.fedbalance.TestDistCpProcedure;
+import org.apache.hadoop.util.Time;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Collections;
+
+import static 
org.apache.hadoop.hdfs.server.federation.FederationTestUtils.createNamenodeReport;
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+import static org.junit.Assert.assertTrue;
+
+
+public class TestRouterDistCpProcedure extends TestDistCpProcedure {
+private static StateStoreDFSCluster cluster;
+private static MiniRouterDFSCluster.RouterContext routerContext;
+private static Configuration routerConf;
+private static StateStoreService stateStore;
+
+@BeforeClass
+public static void globalSetUp() throws Exception {
+cluster = new StateStoreDFSCluster(false, 1);
+// Build and start a router with State Store + admin + RPC
+Configuration conf = new RouterConfigBuilder()
+.stateStore()
+.admin()
+.rpc()
+.build();
+cluster.addRouterOverrides(conf);
+cluster.startRouters();
+routerContext = cluster.getRandomRouter();
+Router router = routerContext.getRouter();
+stateStore = router.getStateStore();
+
+// Add one name services for testing
+ActiveNamenodeResolver membership = router.getNamenodeResolver();
+membership.registerNamenode(createNamenodeReport("ns0", "nn1",
+HAServiceProtocol.HAServiceState.ACTIVE));
+stateStore.refreshCaches(true);
+
+routerConf = new Configuration();
+InetSocketAddress routerSocket = router.getAdminServerAddress();
+routerConf.setSocketAddr(RBFConfigKeys.DFS_ROUTER_ADMIN_ADDRESS_KEY,
+routerSocket);
+}
+
+@Override
+public void testDisableWrite() throws Exception {
+// Firstly add mount entry: /test-write->{ns0,/test-write}.
+String mount = "/test-write";
+MountTable newEntry = MountTable
+

[jira] [Comment Edited] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376429#comment-17376429
 ] 

Daniel Ma edited comment on HDFS-15796 at 7/7/21, 11:57 AM:


[~sodonnell]

yes, I also think it is a better way:
{code:java}
//
A better approach, may be to return a new ArrayList from getTargets, eg:
{code}
The patch is updated, Pls help to review.


was (Author: daniel ma):
[~sodonnell]

yes, I totally agree with the solution you raised.

I will work on it, and update the patch

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch, 0002-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-07 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-16083:
---
Attachment: HDFS-16083.005.1.patch

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> HDFS-16083.005.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16083) Forbid Observer NameNode trigger active namenode log roll

2021-07-07 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-16083:
---
Attachment: (was: HDFS-16083.005.1.patch)

> Forbid Observer NameNode trigger  active namenode log roll
> --
>
> Key: HDFS-16083
> URL: https://issues.apache.org/jira/browse/HDFS-16083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
> Attachments: HDFS-16083.001.patch, HDFS-16083.002.patch, 
> HDFS-16083.003.patch, HDFS-16083.004.patch, HDFS-16083.005.1.patch, 
> HDFS-16083.005.patch, activeRollEdits.png
>
>
> When the Observer NameNode is turned on in the cluster, the Active NameNode 
> will receive rollEditLog RPC requests from the Standby NameNode and Observer 
> NameNode in a short time. Observer NameNode's rollEditLog request is a 
> repetitive operation, so should we forbid Observer NameNode trigger  active 
> namenode log roll ? We  'dfs.ha.log-roll.period' configured is 300( 5 
> minutes) and active NameNode receives rollEditLog RPC as shown in 
> activeRollEdits.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-15796:
-
Attachment: 0002-HDFS-15796.patch

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch, 0002-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16097) Datanode receives ipc requests will throw NPE when datanode quickly restart

2021-07-07 Thread lei w (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lei w updated HDFS-16097:
-
Description: 
Datanode receives ipc requests will throw NPE when datanode quickly restart. 
This is because when DN is reStarted, BlockPool is first registered with 
blockPoolManager and then fsdataset is initialized. When BlockPool is 
registered to blockPoolManager without initializing fsdataset,  DataNode 
receives an IPC request will throw NPE, because it will call related methods 
provided by fsdataset. The stack exception is as follows:



{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
{code}


The  client side stack exception  is as follows:

{code:java}
 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to 
recover block (block=BP-###:blk_###, 
datanode=DatanodeInfoWithStorage[,null,null])
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2873)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy26.initReplicaRecovery(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.initReplicaRecovery(InterDatanodeProtocolTranslatorPB.java:83)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.callInitReplicaRecovery(BlockRecoveryWorker.java:571)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker.access$400(BlockRecoveryWorker.java:57)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:142)
at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:610)
at java.lang.Thread.run(Thread.java:748)
{code}



  was:
Datanode receives ipc requests will throw NPE when datanode quickly restart. 
This is because when DN is reStarted, BlockPool is first registered with 
blockPoolManager and then fsdataset is initialized. When BlockPool is 
registered to blockPoolManager without initializing fsdataset,  DataNode 
receives an IPC request will throw NPE, because it will call related methods 
provided by fsdataset. The stack exception is as follows:



{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:3468)
at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolServerSideTranslatorPB.initReplicaRecovery(InterDatanodeProtocolServerSideTranslatorPB.java:55)
at 
org.apache.hadoop.hdfs.protocol.proto.InterDatanodeProtocolProtos$InterDatanodeProtocolService$2.callBlockingMethod(InterDatanodeProtocolProtos.java:3105)
at 

[jira] [Work started] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread JiangHua Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-16118 started by JiangHua Zhu.
---
> Improve the number of handlers that initialize 
> NameNodeRpcServer#clientRpcServer
> 
>
> Key: HDFS-16118
> URL: https://issues.apache.org/jira/browse/HDFS-16118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When initializing NameNodeRpcServer, if the value of 
> dfs.namenode.lifeline.handler.count is set to be less than 0 (such as -1, of 
> course, this is rare), when determining the number of lifeline RPC handlers, 
> it will be based on dfs.namenode.handler .count * lifelineHandlerRatio is 
> determined.
> The code can be found:
> int lifelineHandlerCount = conf.getInt(
>DFS_NAMENODE_LIFELINE_HANDLER_COUNT_KEY, 0);
>if (lifelineHandlerCount <= 0) {
>  float lifelineHandlerRatio = conf.getFloat(
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_KEY,
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_DEFAULT);
>  lifelineHandlerCount = Math.max(
>  (int)(handlerCount * lifelineHandlerRatio), 1);
>}
> When this happens, the handlerCount should be subtracted from the 
> lifelineHandlerCount when in fact it doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16118?focusedWorklogId=619884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619884
 ]

ASF GitHub Bot logged work on HDFS-16118:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 10:23
Start Date: 07/Jul/21 10:23
Worklog Time Spent: 10m 
  Work Description: jianghuazhu opened a new pull request #3186:
URL: https://github.com/apache/hadoop/pull/3186


   …Server#clientRpcServer.
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619884)
Remaining Estimate: 0h
Time Spent: 10m

> Improve the number of handlers that initialize 
> NameNodeRpcServer#clientRpcServer
> 
>
> Key: HDFS-16118
> URL: https://issues.apache.org/jira/browse/HDFS-16118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When initializing NameNodeRpcServer, if the value of 
> dfs.namenode.lifeline.handler.count is set to be less than 0 (such as -1, of 
> course, this is rare), when determining the number of lifeline RPC handlers, 
> it will be based on dfs.namenode.handler .count * lifelineHandlerRatio is 
> determined.
> The code can be found:
> int lifelineHandlerCount = conf.getInt(
>DFS_NAMENODE_LIFELINE_HANDLER_COUNT_KEY, 0);
>if (lifelineHandlerCount <= 0) {
>  float lifelineHandlerRatio = conf.getFloat(
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_KEY,
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_DEFAULT);
>  lifelineHandlerCount = Math.max(
>  (int)(handlerCount * lifelineHandlerRatio), 1);
>}
> When this happens, the handlerCount should be subtracted from the 
> lifelineHandlerCount when in fact it doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16118:
--
Labels: pull-request-available  (was: )

> Improve the number of handlers that initialize 
> NameNodeRpcServer#clientRpcServer
> 
>
> Key: HDFS-16118
> URL: https://issues.apache.org/jira/browse/HDFS-16118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When initializing NameNodeRpcServer, if the value of 
> dfs.namenode.lifeline.handler.count is set to be less than 0 (such as -1, of 
> course, this is rare), when determining the number of lifeline RPC handlers, 
> it will be based on dfs.namenode.handler .count * lifelineHandlerRatio is 
> determined.
> The code can be found:
> int lifelineHandlerCount = conf.getInt(
>DFS_NAMENODE_LIFELINE_HANDLER_COUNT_KEY, 0);
>if (lifelineHandlerCount <= 0) {
>  float lifelineHandlerRatio = conf.getFloat(
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_KEY,
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_DEFAULT);
>  lifelineHandlerCount = Math.max(
>  (int)(handlerCount * lifelineHandlerRatio), 1);
>}
> When this happens, the handlerCount should be subtracted from the 
> lifelineHandlerCount when in fact it doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread jiaguodong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaguodong updated HDFS-16119:
--
Description: 
 

when start balancer with parameters -hotBlockTimeInterval xxx,  it is invalid.

but set hdfs-site.xml is valid.


 dfs.balancer.getBlocks.hot-time-interval
 3600
 

 

 

  was:
 

when start balancer with parameters -hotBlockTimeInterval xxx,  it is not valid.

but set hdfs-site.xml is valid.


 dfs.balancer.getBlocks.hot-time-interval
 3600


 

 


> start balancer with parameters -hotBlockTimeInterval xxx is invalid
> ---
>
> Key: HDFS-16119
> URL: https://issues.apache.org/jira/browse/HDFS-16119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> when start balancer with parameters -hotBlockTimeInterval xxx,  it is invalid.
> but set hdfs-site.xml is valid.
> 
>  dfs.balancer.getBlocks.hot-time-interval
>  3600
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?focusedWorklogId=619875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619875
 ]

ASF GitHub Bot logged work on HDFS-16119:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 10:06
Start Date: 07/Jul/21 10:06
Worklog Time Spent: 10m 
  Work Description: JiaguodongF opened a new pull request #3185:
URL: https://github.com/apache/hadoop/pull/3185


   when start balancer with parameters -hotBlockTimeInterval xxx,  it is not 
valid.
   
   but set hdfs-site.xml is valid.
   
   dfs.balancer.getBlocks.hot-time-interval
   1000
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619875)
Remaining Estimate: 0h
Time Spent: 10m

> start balancer with parameters -hotBlockTimeInterval xxx is invalid
> ---
>
> Key: HDFS-16119
> URL: https://issues.apache.org/jira/browse/HDFS-16119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> when start balancer with parameters -hotBlockTimeInterval xxx,  it is not 
> valid.
> but set hdfs-site.xml is valid.
> 
>  dfs.balancer.getBlocks.hot-time-interval
>  3600
> 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16119:
--
Labels: pull-request-available  (was: )

> start balancer with parameters -hotBlockTimeInterval xxx is invalid
> ---
>
> Key: HDFS-16119
> URL: https://issues.apache.org/jira/browse/HDFS-16119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> when start balancer with parameters -hotBlockTimeInterval xxx,  it is not 
> valid.
> but set hdfs-site.xml is valid.
> 
>  dfs.balancer.getBlocks.hot-time-interval
>  3600
> 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread jiaguodong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaguodong updated HDFS-16119:
--
Description: 
 

when start balancer with parameters -hotBlockTimeInterval xxx,  it is not valid.

but set hdfs-site.xml is valid.


 dfs.balancer.getBlocks.hot-time-interval
 3600


 

 

  was:
the parameter  hotBlockTimeInterval only set in hdfs-site.xml is valid, 

start balancer with parameters -hotBlockTimeInterval xxx is invalid. 

when start balancer with parameters -hotBlockTimeInterval xxx.  it is not valid.

but set 


> start balancer with parameters -hotBlockTimeInterval xxx is invalid
> ---
>
> Key: HDFS-16119
> URL: https://issues.apache.org/jira/browse/HDFS-16119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Priority: Minor
>
>  
> when start balancer with parameters -hotBlockTimeInterval xxx,  it is not 
> valid.
> but set hdfs-site.xml is valid.
> 
>  dfs.balancer.getBlocks.hot-time-interval
>  3600
> 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread jiaguodong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaguodong updated HDFS-16119:
--
Description: 
the parameter  hotBlockTimeInterval only set in hdfs-site.xml is valid, 

start balancer with parameters -hotBlockTimeInterval xxx is invalid. 

when start balancer with parameters -hotBlockTimeInterval xxx.  it is not valid.

but set 

  was:
the parameter  hotBlockTimeInterval only set in hdfs-site.xml is valid, 

start balancer with parameters -hotBlockTimeInterval xxx is invalid. 

 


> start balancer with parameters -hotBlockTimeInterval xxx is invalid
> ---
>
> Key: HDFS-16119
> URL: https://issues.apache.org/jira/browse/HDFS-16119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Priority: Minor
>
> the parameter  hotBlockTimeInterval only set in hdfs-site.xml is valid, 
> start balancer with parameters -hotBlockTimeInterval xxx is invalid. 
> when start balancer with parameters -hotBlockTimeInterval xxx.  it is not 
> valid.
> but set 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval xxx is invalid

2021-07-07 Thread jiaguodong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaguodong updated HDFS-16119:
--
Summary: start balancer with parameters -hotBlockTimeInterval xxx is 
invalid  (was: start balancer with parameters -hotBlockTimeInterval is invalid)

> start balancer with parameters -hotBlockTimeInterval xxx is invalid
> ---
>
> Key: HDFS-16119
> URL: https://issues.apache.org/jira/browse/HDFS-16119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Priority: Minor
>
> the parameter  hotBlockTimeInterval only set in hdfs-site.xml is valid, 
> start balancer with parameters -hotBlockTimeInterval xxx is invalid. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16119) start balancer with parameters -hotBlockTimeInterval is invalid

2021-07-07 Thread jiaguodong (Jira)
jiaguodong created HDFS-16119:
-

 Summary: start balancer with parameters -hotBlockTimeInterval is 
invalid
 Key: HDFS-16119
 URL: https://issues.apache.org/jira/browse/HDFS-16119
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: jiaguodong


the parameter  hotBlockTimeInterval only set in hdfs-site.xml is valid, 

start balancer with parameters -hotBlockTimeInterval xxx is invalid. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Daniel Ma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376429#comment-17376429
 ] 

Daniel Ma commented on HDFS-15796:
--

[~sodonnell]

yes, I totally agree with the solution you raised.

I will work on it, and update the patch

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376421#comment-17376421
 ] 

Stephen O'Donnell commented on HDFS-15796:
--

[~Hemanth Boyina] Would you have time to have a quick look, as I think your 
earlier change may have caused this one? Please see my comment above.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-07-07 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376419#comment-17376419
 ] 

Stephen O'Donnell commented on HDFS-15796:
--

Looking at the changes in this area, I think the problem was caused by 
HDFS-15159. However I don't think the solution in the patch is the best 
approach to fix this. It means that if anyone else tries to use getTargets(...) 
again in the future, they will need to know to synchronise around the results, 
and this could result in another bug like this.

A better approach, may be to return a new ArrayList from getTargets, eg:

{code}
  List getTargets(BlockInfo block) {
synchronized (pendingReconstructions) {
  PendingBlockInfo found = pendingReconstructions.get(block);
  if (found != null) {
return new ArrayList<>(found.targets);  // changed line here
  }
}
return null;
  }
{code}

That way, it doesn't matter if something else changes the list before it is 
used.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
> Attachments: 0001-HDFS-15796.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16117) Add file count info in audit log to record the file count for delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-16117:
-
Summary: Add file count info in audit log to record the file count for 
delete and getListing RPC request to assist user trouble shooting when RPC cost 
is increasing   (was: Add file count info in audit log to record the file count 
count in delete and getListing RPC request to assist user trouble shooting when 
RPC cost is increasing )

> Add file count info in audit log to record the file count for delete and 
> getListing RPC request to assist user trouble shooting when RPC cost is 
> increasing 
> 
>
> Key: HDFS-16117
> URL: https://issues.apache.org/jira/browse/HDFS-16117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16117.patch
>
>
> Currently, there is no file count in audit log for delete and getListing RPC 
> request, therefore, for the increasing RPC call, it is not easy to configure 
> it out whether the time-consuming RPC  request is related to too many files 
> be operated in the RPC request.
>  
> Therefore, It it necessary to add file count info in the audit log to assist 
> maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16114) the balancer parameters print error

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=619849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619849
 ]

ASF GitHub Bot logged work on HDFS-16114:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 08:44
Start Date: 07/Jul/21 08:44
Worklog Time Spent: 10m 
  Work Description: hemanthboyina commented on pull request #3179:
URL: https://github.com/apache/hadoop/pull/3179#issuecomment-875413672


   thanks @JiaguodongF for the contribution , thanks @tomscut for the review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619849)
Time Spent: 1h 10m  (was: 1h)

> the balancer parameters print error
> ---
>
> Key: HDFS-16114
> URL: https://issues.apache.org/jira/browse/HDFS-16114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Assignee: jiaguodong
>Priority: Minor
>  Labels: balancer, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> public String toString() {
>  return String.format("%s.%s [%s," + " threshold = %s,"
>  + " max idle iteration = %s," + " #excluded nodes = %s,"
>  + " #included nodes = %s," + " #source nodes = %s,"
>  + " #blockpools = %s," + " run during upgrade = %s,"
>  {color:#FF}+ " hot block time interval = %s]"{color}
> {color:#FF} + " sort top nodes = %s",{color}
>  Balancer.class.getSimpleName(), getClass().getSimpleName(), policy,
>  threshold, maxIdleIteration, excludedNodes.size(),
>  includedNodes.size(), sourceNodes.size(), blockpools.size(),
>  runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color});
> }
> print error.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16114) the balancer parameters print error

2021-07-07 Thread Hemanth Boyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376407#comment-17376407
 ] 

Hemanth Boyina commented on HDFS-16114:
---

committed to trunk

thanks [~jiaguodong] for the contribution , thanks [~tomscut] for the review

> the balancer parameters print error
> ---
>
> Key: HDFS-16114
> URL: https://issues.apache.org/jira/browse/HDFS-16114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Assignee: jiaguodong
>Priority: Minor
>  Labels: balancer, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> public String toString() {
>  return String.format("%s.%s [%s," + " threshold = %s,"
>  + " max idle iteration = %s," + " #excluded nodes = %s,"
>  + " #included nodes = %s," + " #source nodes = %s,"
>  + " #blockpools = %s," + " run during upgrade = %s,"
>  {color:#FF}+ " hot block time interval = %s]"{color}
> {color:#FF} + " sort top nodes = %s",{color}
>  Balancer.class.getSimpleName(), getClass().getSimpleName(), policy,
>  threshold, maxIdleIteration, excludedNodes.size(),
>  includedNodes.size(), sourceNodes.size(), blockpools.size(),
>  runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color});
> }
> print error.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16114) the balancer parameters print error

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16114?focusedWorklogId=619848=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619848
 ]

ASF GitHub Bot logged work on HDFS-16114:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 08:43
Start Date: 07/Jul/21 08:43
Worklog Time Spent: 10m 
  Work Description: hemanthboyina merged pull request #3179:
URL: https://github.com/apache/hadoop/pull/3179


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619848)
Time Spent: 1h  (was: 50m)

> the balancer parameters print error
> ---
>
> Key: HDFS-16114
> URL: https://issues.apache.org/jira/browse/HDFS-16114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Assignee: jiaguodong
>Priority: Minor
>  Labels: balancer, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> public String toString() {
>  return String.format("%s.%s [%s," + " threshold = %s,"
>  + " max idle iteration = %s," + " #excluded nodes = %s,"
>  + " #included nodes = %s," + " #source nodes = %s,"
>  + " #blockpools = %s," + " run during upgrade = %s,"
>  {color:#FF}+ " hot block time interval = %s]"{color}
> {color:#FF} + " sort top nodes = %s",{color}
>  Balancer.class.getSimpleName(), getClass().getSimpleName(), policy,
>  threshold, maxIdleIteration, excludedNodes.size(),
>  includedNodes.size(), sourceNodes.size(), blockpools.size(),
>  runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color});
> }
> print error.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16114) the balancer parameters print error

2021-07-07 Thread Hemanth Boyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Boyina resolved HDFS-16114.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> the balancer parameters print error
> ---
>
> Key: HDFS-16114
> URL: https://issues.apache.org/jira/browse/HDFS-16114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Assignee: jiaguodong
>Priority: Minor
>  Labels: balancer, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> public String toString() {
>  return String.format("%s.%s [%s," + " threshold = %s,"
>  + " max idle iteration = %s," + " #excluded nodes = %s,"
>  + " #included nodes = %s," + " #source nodes = %s,"
>  + " #blockpools = %s," + " run during upgrade = %s,"
>  {color:#FF}+ " hot block time interval = %s]"{color}
> {color:#FF} + " sort top nodes = %s",{color}
>  Balancer.class.getSimpleName(), getClass().getSimpleName(), policy,
>  threshold, maxIdleIteration, excludedNodes.size(),
>  includedNodes.size(), sourceNodes.size(), blockpools.size(),
>  runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color});
> }
> print error.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread JiangHua Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu reassigned HDFS-16118:
---

Assignee: JiangHua Zhu

> Improve the number of handlers that initialize 
> NameNodeRpcServer#clientRpcServer
> 
>
> Key: HDFS-16118
> URL: https://issues.apache.org/jira/browse/HDFS-16118
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>
> When initializing NameNodeRpcServer, if the value of 
> dfs.namenode.lifeline.handler.count is set to be less than 0 (such as -1, of 
> course, this is rare), when determining the number of lifeline RPC handlers, 
> it will be based on dfs.namenode.handler .count * lifelineHandlerRatio is 
> determined.
> The code can be found:
> int lifelineHandlerCount = conf.getInt(
>DFS_NAMENODE_LIFELINE_HANDLER_COUNT_KEY, 0);
>if (lifelineHandlerCount <= 0) {
>  float lifelineHandlerRatio = conf.getFloat(
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_KEY,
>  DFS_NAMENODE_LIFELINE_HANDLER_RATIO_DEFAULT);
>  lifelineHandlerCount = Math.max(
>  (int)(handlerCount * lifelineHandlerRatio), 1);
>}
> When this happens, the handlerCount should be subtracted from the 
> lifelineHandlerCount when in fact it doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16118) Improve the number of handlers that initialize NameNodeRpcServer#clientRpcServer

2021-07-07 Thread JiangHua Zhu (Jira)
JiangHua Zhu created HDFS-16118:
---

 Summary: Improve the number of handlers that initialize 
NameNodeRpcServer#clientRpcServer
 Key: HDFS-16118
 URL: https://issues.apache.org/jira/browse/HDFS-16118
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: JiangHua Zhu


When initializing NameNodeRpcServer, if the value of 
dfs.namenode.lifeline.handler.count is set to be less than 0 (such as -1, of 
course, this is rare), when determining the number of lifeline RPC handlers, it 
will be based on dfs.namenode.handler .count * lifelineHandlerRatio is 
determined.
The code can be found:
int lifelineHandlerCount = conf.getInt(
   DFS_NAMENODE_LIFELINE_HANDLER_COUNT_KEY, 0);
   if (lifelineHandlerCount <= 0) {
 float lifelineHandlerRatio = conf.getFloat(
 DFS_NAMENODE_LIFELINE_HANDLER_RATIO_KEY,
 DFS_NAMENODE_LIFELINE_HANDLER_RATIO_DEFAULT);
 lifelineHandlerCount = Math.max(
 (int)(handlerCount * lifelineHandlerRatio), 1);
   }
When this happens, the handlerCount should be subtracted from the 
lifelineHandlerCount when in fact it doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16116) Fix Hadoop FedBalance shell and federationBanance markdown bug.

2021-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16116?focusedWorklogId=619821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-619821
 ]

ASF GitHub Bot logged work on HDFS-16116:
-

Author: ASF GitHub Bot
Created on: 07/Jul/21 07:54
Start Date: 07/Jul/21 07:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3181:
URL: https://github.com/apache/hadoop/pull/3181#issuecomment-875377099


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  4s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  |  Shelldocs was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 50s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m  5s |  |  trunk passed  |
   | -1 :x: |  shadedclient  |  22m 23s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   1m 55s |  |  the patch passed  |
   | +1 :green_heart: |  shellcheck  |   0m 10s |  |  No new issues.  |
   | +1 :green_heart: |  shadedclient  |  17m 23s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 45s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   0m 26s |  |  hadoop-federation-balance in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  89m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3181 |
   | Optional Tests | dupname asflicense mvnsite unit codespell shellcheck 
shelldocs markdownlint |
   | uname | Linux 0bd3bf84183c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6c1ecb48151753f0442212bfb7a4f71a66f113be |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/4/testReport/ |
   | Max. process+thread count | 719 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-tools/hadoop-federation-balance U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3181/4/console |
   | versions | git=2.25.1 maven=3.6.3 shellcheck=0.7.0 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 619821)
Time Spent: 50m  (was: 40m)

> Fix Hadoop FedBalance shell and federationBanance markdown bug.
> ---
>
> Key: HDFS-16116
> URL: https://issues.apache.org/jira/browse/HDFS-16116
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: panlijie
>Assignee: panlijie
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Fix Hadoop FedBalance shell and federationBanance markdown bug.




[jira] [Updated] (HDFS-16117) Add file count info in audit log to record the file count count in delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-07 Thread Daniel Ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Ma updated HDFS-16117:
-
Attachment: 0001-HDFS-16117.patch

> Add file count info in audit log to record the file count count in delete and 
> getListing RPC request to assist user trouble shooting when RPC cost is 
> increasing 
> -
>
> Key: HDFS-16117
> URL: https://issues.apache.org/jira/browse/HDFS-16117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Priority: Major
> Fix For: 3.3.1
>
> Attachments: 0001-HDFS-16117.patch
>
>
> Currently, there is no file count in audit log for delete and getListing RPC 
> request, therefore, for the increasing RPC call, it is not easy to configure 
> it out whether the time-consuming RPC  request is related to too many files 
> be operated in the RPC request.
>  
> Therefore, It it necessary to add file count info in the audit log to assist 
> maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16117) Add file count info in audit log to record the file count count in delete and getListing RPC request to assist user trouble shooting when RPC cost is increasing

2021-07-07 Thread Daniel Ma (Jira)
Daniel Ma created HDFS-16117:


 Summary: Add file count info in audit log to record the file count 
count in delete and getListing RPC request to assist user trouble shooting when 
RPC cost is increasing 
 Key: HDFS-16117
 URL: https://issues.apache.org/jira/browse/HDFS-16117
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.3.1
Reporter: Daniel Ma
 Fix For: 3.3.1


Currently, there is no file count in audit log for delete and getListing RPC 
request, therefore, for the increasing RPC call, it is not easy to configure it 
out whether the time-consuming RPC  request is related to too many files be 
operated in the RPC request.

 

Therefore, It it necessary to add file count info in the audit log to assist 
maintenance 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16114) the balancer parameters print error

2021-07-07 Thread Hemanth Boyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Boyina reassigned HDFS-16114:
-

Assignee: jiaguodong

> the balancer parameters print error
> ---
>
> Key: HDFS-16114
> URL: https://issues.apache.org/jira/browse/HDFS-16114
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: jiaguodong
>Assignee: jiaguodong
>Priority: Minor
>  Labels: balancer, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> public String toString() {
>  return String.format("%s.%s [%s," + " threshold = %s,"
>  + " max idle iteration = %s," + " #excluded nodes = %s,"
>  + " #included nodes = %s," + " #source nodes = %s,"
>  + " #blockpools = %s," + " run during upgrade = %s,"
>  {color:#FF}+ " hot block time interval = %s]"{color}
> {color:#FF} + " sort top nodes = %s",{color}
>  Balancer.class.getSimpleName(), getClass().getSimpleName(), policy,
>  threshold, maxIdleIteration, excludedNodes.size(),
>  includedNodes.size(), sourceNodes.size(), blockpools.size(),
>  runDuringUpgrade, {color:#FF}sortTopNodes, hotBlockTimeInterval{color});
> }
> print error.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org