from:"Sean Chow \(Jira\)"

[jira] [Commented] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129949#comment-17129949
 ] 

Sean Chow commented on HDFS-15402:
--

My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:

 
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because not Exception could be found.

> Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode
> 
>
> Key: HDFS-15402
> URL: https://issues.apache.org/jira/browse/HDFS-15402
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.1.3
>Reporter: Sean Chow
>Priority: Major
>
> We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
> periodically. But there is too much CLOSE-WAIT socket state that lead the 
> normal webhdfs request failed.
>  
> {code:java}
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
> 6729
> lsof -i:37296
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 101015 hdfs 3044u IPv4 271157177 0t0 TCP 
> localhost:50075->localhost:37296 (CLOSE_WAIT)
> {code}
>  
> The pid 101015 is the datanode's process id.
> I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
> them have the same issue. When the metric retriving script stop, the number 
> of CLOSE-WAIT does not increase anymore.
>  The version apache-hadoop-2.9.2 does not have this issue with the same 
> retriving metric script.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129949#comment-17129949
 ] 

Sean Chow edited comment on HDFS-15402 at 6/10/20, 2:01 AM:


My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because no Exception could be found.


was (Author: seanlook):
My clients use webhdfs put file a lot. I think this issue is not caused by 
webhdfs, but the jmx endpoint.

Occasionally the http port can not be accessed:

 
{code:java}
$ curl http://127.0.0.1:50075/jmx > a
curl: (7) couldn't connect to host
{code}
 

After restart datanodes, the CLOSE-WAIT disappears.

Currently I have no clue for this because not Exception could be found.

> Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode
> 
>
> Key: HDFS-15402
> URL: https://issues.apache.org/jira/browse/HDFS-15402
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.1.3
>Reporter: Sean Chow
>Priority: Major
>
> We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
> periodically. But there is too much CLOSE-WAIT socket state that lead the 
> normal webhdfs request failed.
>  
> {code:java}
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
> CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
> $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
> 6729
> lsof -i:37296
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 101015 hdfs 3044u IPv4 271157177 0t0 TCP 
> localhost:50075->localhost:37296 (CLOSE_WAIT)
> {code}
>  
> The pid 101015 is the datanode's process id.
> I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
> them have the same issue. When the metric retriving script stop, the number 
> of CLOSE-WAIT does not increase anymore.
>  The version apache-hadoop-2.9.2 does not have this issue with the same 
> retriving metric script.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15402) Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

2020-06-09 Thread Sean Chow (Jira)

Sean Chow created HDFS-15402:


 Summary: Requesting http jmx metrics leads to too much CLOSE-WAIT 
on datanode
 Key: HDFS-15402
 URL: https://issues.apache.org/jira/browse/HDFS-15402
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: metrics
Affects Versions: 3.1.3
Reporter: Sean Chow


We access  {{http://127.0.0.1:50075/jmx}}  to get datanode metrics 
periodically. But there is too much CLOSE-WAIT socket state that lead the 
normal webhdfs request failed.

 
{code:java}
$ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
$ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
6729
lsof -i:37296
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 101015 hdfs 3044u IPv4 271157177 0t0 TCP localhost:50075->localhost:37296 
(CLOSE_WAIT)
{code}
 

The pid 101015 is the datanode's process id.

I use {{cdh6.1.1}} and {{apache-hadoop-3.1.3}} in my production, and both of 
them have the same issue. When the metric retriving script stop, the number of 
CLOSE-WAIT does not increase anymore.

 The version apache-hadoop-2.9.2 does not have this issue with the same 
retriving metric script.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-08 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128236#comment-17128236
 ] 

Sean Chow edited comment on HDFS-15390 at 6/8/20, 12:10 PM:


Hi [~hexiaoqiao] , in fact, I've done some debug and the exception is thrown as 
below:
{code:java}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  if (action.action == RetryAction.RetryDecision.FAIL) {
if (action.reason != null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Failed to connect to server: " + server + ": "
+ action.reason, ioe);
  }
}
// HERE is where the IOException throws
throw ioe;
  }
...{code}
But the strange is  {{Failed to connect to server}}  debug log is not logged.

 

We use hadoop version  hadoop-2.6.0-cdh5.4.11. And I've tested it with the 
trunk version, the same issue.

Though it only affects the hdfs client side, I think you're right. But how to 
move it  common project jira? :(


was (Author: seanlook):
In fact, I've done some debug and the exception is thrown as below:
{code:java}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  if (action.action == RetryAction.RetryDecision.FAIL) {
if (action.reason != null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Failed to connect to server: " + server + ": "
+ action.reason, ioe);
  }
}
// HERE is where the IOException throws
throw ioe;
  }
...{code}
But the strange is  {{Failed to connect to server}}  debug log is not logged.

 

We use hadoop version  hadoop-2.6.0-cdh5.4.11. And I've tested it with the 
trunk version, the same issue.

Though it only affects the hdfs client side, I think you're right. But how to 
move it  common project jira? :(

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)

[jira] [Comment Edited] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-08 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128236#comment-17128236
 ] 

Sean Chow edited comment on HDFS-15390 at 6/8/20, 12:09 PM:


In fact, I've done some debug and the exception is thrown as below:
{code:java}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  if (action.action == RetryAction.RetryDecision.FAIL) {
if (action.reason != null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Failed to connect to server: " + server + ": "
+ action.reason, ioe);
  }
}
// HERE is where the IOException throws
throw ioe;
  }
...{code}
But the strange is  {{Failed to connect to server}}  debug log is not logged.

 

We use hadoop version  hadoop-2.6.0-cdh5.4.11. And I've tested it with the 
trunk version, the same issue.

Though it only affects the hdfs client side, I think you're right. But how to 
move it  common project jira? :(


was (Author: seanlook):
In fact, I've done some debug and the exception is thrown as below:
{code:java}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  if (action.action == RetryAction.RetryDecision.FAIL) {
if (action.reason != null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Failed to connect to server: " + server + ": "
+ action.reason, ioe);
  }
}
// HERE is where the IOException throws
throw ioe;
  }
...{code}
But the strange is  {{Failed to connect to server}}  debug log is not logged.

 

We use hadoop version  hadoop-2.6.0-cdh5.4.11. And I've tested it with the 
trunk version, the same issue.

Though it only affects the client side, I move this ticket to hadoop-common.

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvoca

[jira] [Commented] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-08 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128236#comment-17128236
 ] 

Sean Chow commented on HDFS-15390:
--

In fact, I've done some debug and the exception is thrown as below:
{code:java}
private void handleConnectionFailure(int curRetries, IOException ioe
) throws IOException {
  closeConnection();

  final RetryAction action;
  try {
action = connectionRetryPolicy.shouldRetry(ioe, curRetries, 0, true);
  } catch(Exception e) {
throw e instanceof IOException? (IOException)e: new IOException(e);
  }
  if (action.action == RetryAction.RetryDecision.FAIL) {
if (action.reason != null) {
  if (LOG.isDebugEnabled()) {
LOG.debug("Failed to connect to server: " + server + ": "
+ action.reason, ioe);
  }
}
// HERE is where the IOException throws
throw ioe;
  }
...{code}
But the strange is  {{Failed to connect to server}}  debug log is not logged.

 

We use hadoop version  hadoop-2.6.0-cdh5.4.11. And I've tested it with the 
trunk version, the same issue.

Though it only affects the client side, I move this ticket to hadoop-common.

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in

[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-08 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15390:
-
Component/s: (was: dfsclient)

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in updateAddress(), the old namenode 
> ipaddress cannot be accessed now
> // handleConnectionFailure will thow an exception, the next retry never have 
> a chance to use the right server updated in updateAddress()
>   handleConnectionFailure(ioFailures++, ie);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-06 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127515#comment-17127515
 ] 

Sean Chow commented on HDFS-15390:
--

Hi [~ayushtkn] , I've tried written a unit test for this but it's not easy :(

Because emulating namenode ipaddr change need the third namenode setup to 
connect.

 HDFS-4404 is good example, not for this issue.

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in updateAddress(), the old namenode 
> ipaddress cannot be accessed now
> // handleConnectionFailure will thow an exception, the next retry never have 
> a chance to use the right server updated in updateAddress()
>   handleConnectionFailure(ioFailures++, ie);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126638#comment-17126638
 ] 

Sean Chow edited comment on HDFS-15390 at 6/5/20, 2:57 PM:
---

It's easy to reproduce. You have setup HA namenodes, and a new machine with the 
same hostname with nn2(standby), and copied name-data directory.
 # Use {{hdfs dfs put}} to write a bigfile (to make it a long running client)
 # Stop old nn2, and start new nn2
 # Update the nn2 hostname to resolve as the new ipaddr on all hosts
 # Failover from nn1 to nn2
 # Now you found the client occurs error continuously. (In yarn nodemanager 
scenario, this nodemanager is totally sick until restarted)

 

There 's two way to fix this: 
 # When updateAddress is true, do not handle ConnectionFailure this round 
 # When address change detected, update namenode proxies (only with 
{{ConfiguredFailoverProxyProvider}})

Method one is easy, and in this connection lifecycle the client will use the 
right {{server}} to connect. But when the client connection closed and create a 
new one. It will always try to getConnection from the retired ipaddr, because 
the namenode proxies is still the old one.

Method two solve the root cause. Everytime the client failover namenodes, check 
ipaddr changed or not. If changed, re-initialize the namenode failover proxies.


was (Author: seanlook):
There 's two way to fix this: 
 # When updateAddress is true, do not handle ConnectionFailure this round 
 # When address change detected, update namenode proxies (only with 
{{ConfiguredFailoverProxyProvider}})

Method one is easy, and in this connection lifecycle the client will use the 
right {{server}} to connect. But when the client connection closed and create a 
new one. It will always try to getConnection from the retired ipaddr, because 
the namenode proxies is still the old one.

Method two solve the root cause. Everytime the client failover namenodes, check 
ipaddr changed or not. If changed, re-initialize the namenode failover proxies.

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code

[jira] [Commented] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126837#comment-17126837
 ] 

Sean Chow commented on HDFS-15390:
--

Patch attached.

Now we can see the exception is ignored when address updated, and the file is 
written successfully.
{code:java}
20/06/05 20:54:51 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/05 20:54:51 DEBUG ipc.Client: Failed to connect to server: 
nn2-192-168-1-100/192.168.1.200:9000: try once and fail.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
...
20/06/05 20:54:51 DEBUG hdfs.DFSOutputStream: enqueue full packet seqno: ...
20/06/05 20:54:51 DEBUG hdfs.DataStreamer: Queued packet 100076
20/06/05 20:54:51 WARN ipc.Client: Exception when handle ConnectionFailure: 
Connection refused
{code}

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in updateAddress(), the old namenode 
> ipaddress cannot be accessed now
> // handleConnectionFailure will thow an exception, the next retry never have 
> a chance to use the right server updated in updateAddress()
>   han

[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15390:
-
Attachment: HDFS-15390.01.patch

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-15390.01.patch
>
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in updateAddress(), the old namenode 
> ipaddress cannot be accessed now
> // handleConnectionFailure will thow an exception, the next retry never have 
> a chance to use the right server updated in updateAddress()
>   handleConnectionFailure(ioFailures++, ie);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15390:
-
Description: 
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
{code}
 

We can see the client has {{Address change detected}}, but it still fails. I 
find out that's because when method {{updateAddress()}} return true,  the 
{{handleConnectionFailure()}} thow an exception that break the next retry with 
the right ipaddr.

Client.java: setupConnection()
{code:java}
} catch (ConnectTimeoutException toe) {
  /* Check for an address change and update the local reference.
   * Reset the failure counter if the address was changed
   */
  if (updateAddress()) {
timeoutFailures = ioFailures = 0;
  }
  handleConnectionTimeout(timeoutFailures++,
  maxRetriesOnSocketTimeouts, toe);
} catch (IOException ie) {
  if (updateAddress()) {
timeoutFailures = ioFailures = 0;
  }
// because the namenode ip changed in updateAddress(), the old namenode 
ipaddress cannot be accessed now
// handleConnectionFailure will thow an exception, the next retry never have a 
chance to use the right server updated in updateAddress()
  handleConnectionFailure(ioFailures++, ie);
}
{code}
 

  was:
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.ap

[jira] [Commented] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126638#comment-17126638
 ] 

Sean Chow commented on HDFS-15390:
--

There 's two way to fix this: 
 # When updateAddress is true, do not handle ConnectionFailure this round 
 # When address change detected, update namenode proxies (only with 
{{ConfiguredFailoverProxyProvider}})

Method one is easy, and in this connection lifecycle the client will use the 
right {{server}} to connect. But when the client connection closed and create a 
new one. It will always try to getConnection from the retired ipaddr, because 
the namenode proxies is still the old one.

Method two solve the root cause. Everytime the client failover namenodes, check 
ipaddr changed or not. If changed, re-initialize the namenode failover proxies.

> client fails forever when namenode ipaddr changed
> -
>
> Key: HDFS-15390
> URL: https://issues.apache.org/jira/browse/HDFS-15390
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 2.10.0, 2.9.2, 3.2.1
>Reporter: Sean Chow
>Priority: Major
>
> For machine replacement, I replace my standby namenode with a new ipaddr and 
> keep the same hostname. Also update the client's hosts to make it resolve 
> correctly
> When I try to run failover to transite the new namenode(let's say nn2), the 
> client will fail to read or write forever until it's restarted.
> That make yarn nodemanager in sick state. Even the new tasks will encounter 
> this exception  too. Until all nodemanager restart.
>  
> {code:java}
> 20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
> nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
> 20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
> nn2-192-168-1-100/192.168.1.200:9000: Connection refused
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
> at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> at org.apache.hadoop.ipc.Client.call(Client.java:1401)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> {code}
>  
> We can see the client has {{Address change detected}}, but it still fails. I 
> find out that's because when method {{updateAddress()}} return true,  the 
> {{handleConnectionFailure()}} thow an exception that break the next retry 
> with the right ipaddr.
> Client.java: setupConnection()
> {code:java}
> } catch (ConnectTimeoutException toe) {
>   /* Check for an address change and update the local reference.
>* Reset the failure counter if the address was changed
>*/
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
>   handleConnectionTimeout(timeoutFailures++,
>   maxRetriesOnSocketTimeouts, toe);
> } catch (IOException ie) {
>   if (updateAddress()) {
> timeoutFailures = ioFailures = 0;
>   }
> // because the namenode ip changed in updateAddress(), the old namenode 
> ipaddress cannot be accessed now
> // handleConnectionFailure will thow an exception, the next retry never have 
> a change to use the right server updated in updateAddress()
>   handleConnectionFailure(ioFailures++, ie);
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15390:
-
Description: 
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
{code}
 

We can see the client has {{Address change detected}}, but it still fails. I 
find out that's because when method {{updateAddress()}} return true,  the 
{{handleConnectionFailure()}} thow an exception that break the next retry with 
the right ipaddr.

Client.java: setupConnection()
{code:java}
} catch (ConnectTimeoutException toe) {
  /* Check for an address change and update the local reference.
   * Reset the failure counter if the address was changed
   */
  if (updateAddress()) {
timeoutFailures = ioFailures = 0;
  }
  handleConnectionTimeout(timeoutFailures++,
  maxRetriesOnSocketTimeouts, toe);
} catch (IOException ie) {
  if (updateAddress()) {
timeoutFailures = ioFailures = 0;
  }
// because the namenode ip changed in updateAddress(), the old namenode 
ipaddress cannot be accessed now
// handleConnectionFailure will thow an exception, the next retry never have a 
change to use the right server updated in updateAddress()
  handleConnectionFailure(ioFailures++, ie);
}
{code}
 

  was:
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.ap

[jira] [Updated] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15390:
-
Description: 
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
{code}
 

We can see the client has {{Address change detected}}, but it still fails. I 
find out that's because when method {{updateAddress()}} return true,  the 
{{handleConnectionFailure()}} thow an exception that break the next retry with 
the right ipaddr.

 

  was:
For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandl

[jira] [Created] (HDFS-15390) client fails forever when namenode ipaddr changed

2020-06-05 Thread Sean Chow (Jira)

Sean Chow created HDFS-15390:


 Summary: client fails forever when namenode ipaddr changed
 Key: HDFS-15390
 URL: https://issues.apache.org/jira/browse/HDFS-15390
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: dfsclient
Affects Versions: 3.2.1, 2.9.2, 2.10.0
Reporter: Sean Chow


For machine replacement, I replace my standby namenode with a new ipaddr and 
keep the same hostname. Also update the client's hosts to make it resolve 
correctly

When I try to run failover to transite the new namenode(let's say nn2), the 
client will fail to read or write forever until it's restarted.

That make yarn nodemanager in sick state. Even the new tasks will encounter 
this exception  too. Until all nodemanager restart.

 

 
{code:java}
20/06/02 15:12:25 WARN ipc.Client: Address change detected. Old: 
nn2-192-168-1-100/192.168.1.100:9000 New: nn2-192-168-1-100/192.168.1.200:9000
20/06/02 15:12:25 DEBUG ipc.Client: closing ipc connection to 
nn2-192-168-1-100/192.168.1.200:9000: Connection refused
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1517)
at org.apache.hadoop.ipc.Client.call(Client.java:1440)
at org.apache.hadoop.ipc.Client.call(Client.java:1401)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:193)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
{code}
 

 

We can see the client has Address change detected, but it still fails. I find 
out that's because when method updateAddress() return ture,  the 
handleConnectionFailure() thow an exception that break the next retry with the 
right ipaddr.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15048) Fix findbug in DirectoryScanner

2020-04-12 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082046#comment-17082046
 ] 

Sean Chow commented on HDFS-15048:
--

Hi [~iwasakims] , thanks for this patch for resolving HDFS-14476 .

I think the patch also need to commit to branch-2.10. I've added the patch 
though it's exact the same with your  orignal one.

Hope it can be committed this week, as HDFS-14476 may can't catch the Hadoop 
3.3.0 release.

Sorry for the trouble.:P

> Fix findbug in DirectoryScanner
> ---
>
> Key: HDFS-15048
> URL: https://issues.apache.org/jira/browse/HDFS-15048
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch
>
>
> There is a findbug in DirectoryScanner.
> {noformat}
> Multithreaded correctness Warnings
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls 
> Thread.sleep() with a lock held
> Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) 
> In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner
> In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile()
> At DirectoryScanner.java:[line 441]
> {noformat}
> https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-04-12 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Fix Version/s: 2.10.1

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, 
> HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, 
> HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-04-12 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082042#comment-17082042
 ] 

Sean Chow edited comment on HDFS-14476 at 4/13/20, 4:25 AM:


Sorry for the trouble.

I think this patch is safe for 3.3.0 as 
https://issues.apache.org/jira/browse/HDFS-15048 has been merged.

The later discussion is focused on merging it to branch-2. For caused issue 
HDFS-15048, I've add a patch for branch-2.10 in the link 
https://issues.apache.org/jira/browse/HDFS-15048 . Hope it will be committed 
quickly and make this HDFS-14476 *Resolved*


was (Author: seanlook):
I think this patch is safe for 3.3.0 as 
https://issues.apache.org/jira/browse/HDFS-15048 has been merged.

The later discussion is focused on merging it to branch-2. For caused issue 
HDFS-15048, I've add a patch for branch-2.10 in the link 
https://issues.apache.org/jira/browse/HDFS-15048 . Hope it will be committed 
quickly and make this HDFS-14476 *Resolved*

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, 
> HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, 
> HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-04-12 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082042#comment-17082042
 ] 

Sean Chow commented on HDFS-14476:
--

I think this patch is safe for 3.3.0 as 
https://issues.apache.org/jira/browse/HDFS-15048 has been merged.

The later discussion is focused on merging it to branch-2. For caused issue 
HDFS-15048, I've add a patch for branch-2.10 in the link 
https://issues.apache.org/jira/browse/HDFS-15048 . Hope it will be committed 
quickly and make this HDFS-14476 *Resolved*

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476-branch-2.10.02.patch, 
> HDFS-14476.00.patch, HDFS-14476.002.patch, HDFS-14476.01.patch, 
> HDFS-14476.branch-3.2.001.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15048) Fix findbug in DirectoryScanner

2020-04-12 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15048:
-
Attachment: HDFS-15048-branch-2.10.002.patch

> Fix findbug in DirectoryScanner
> ---
>
> Key: HDFS-15048
> URL: https://issues.apache.org/jira/browse/HDFS-15048
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch
>
>
> There is a findbug in DirectoryScanner.
> {noformat}
> Multithreaded correctness Warnings
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls 
> Thread.sleep() with a lock held
> Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) 
> In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner
> In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile()
> At DirectoryScanner.java:[line 441]
> {noformat}
> https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15048) Fix findbug in DirectoryScanner

2020-04-12 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-15048:
-
Fix Version/s: 2.10.1

> Fix findbug in DirectoryScanner
> ---
>
> Key: HDFS-15048
> URL: https://issues.apache.org/jira/browse/HDFS-15048
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-15048-branch-2.10.002.patch, HDFS-15048.001.patch
>
>
> There is a findbug in DirectoryScanner.
> {noformat}
> Multithreaded correctness Warnings
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls 
> Thread.sleep() with a lock held
> Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) 
> In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner
> In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile()
> At DirectoryScanner.java:[line 441]
> {noformat}
> https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-03-24 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065398#comment-17065398
 ] 

Sean Chow commented on HDFS-14476:
--

Hi [~weichiu], I can't see why it builds failed. Any chance to merge this to 
2.10.1?

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, 
> HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-02-11 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034391#comment-17034391
 ] 

Sean Chow commented on HDFS-14476:
--

Hi [~weichiu] , what I can do to make it build pass.

thanks.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, 
> HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-01-16 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017759#comment-17017759
 ] 

Sean Chow commented on HDFS-14476:
--

Build failed:(

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, 
> HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-01-14 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014921#comment-17014921
 ] 

Sean Chow commented on HDFS-14476:
--

Attach the patch for branch-2: HDFS-14476-branch-2.02.patch

This patch also fix issue HDFS-14751

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, 
> HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-01-14 Thread Sean Chow (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: HDFS-14476-branch-2.02.patch

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, 
> HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-01-06 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008662#comment-17008662
 ] 

Sean Chow commented on HDFS-14476:
--

Hi [~weichiu], will you merge this to version 2.8.x and 2.9.x?

Thanks for [~leosun08]  fixing the testcase.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.002.patch, HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14751) TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk

2019-09-01 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920574#comment-16920574
 ] 

Sean Chow edited comment on HDFS-14751 at 9/2/19 2:38 AM:
--

Hi [~leosun08] Shall we just synchronize {{diffs} ?

{code:java}
  void reconcile() throws IOException {
scan();
synchronized(diffs) {
  for (Entry> entry : diffs.entrySet()) {
String bpid = entry.getKey();
LinkedList diff = entry.getValue();

for (ScanInfo info : diff) {
  dataset.checkAndUpdate(bpid, info.getBlockId(), info.getBlockFile(),
  info.getMetaFile(), info.getVolume());
}
  }
}
if (!retainDiffs) clear();
  }
{code}

And in {{scan()}}:
{code:java}
// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {
synchronized(diffs) {...}
}
{code}



was (Author: seanlook):
Shall we just synchronize {{diffs}

{code:java}
  void reconcile() throws IOException {
scan();
synchronized(diffs) {
  for (Entry> entry : diffs.entrySet()) {
String bpid = entry.getKey();
LinkedList diff = entry.getValue();

for (ScanInfo info : diff) {
  dataset.checkAndUpdate(bpid, info.getBlockId(), info.getBlockFile(),
  info.getMetaFile(), info.getVolume());
}
  }
}
if (!retainDiffs) clear();
  }
{code}

And in {{scan()}}:
{code:java}
// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {
synchronized(diffs) {...}
}
{code}


> TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk
> -
>
> Key: HDFS-14751
> URL: https://issues.apache.org/jira/browse/HDFS-14751
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14751.001.patch
>
>
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 21.693 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
> [ERROR] 
> testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency)
>   Time elapsed: 7.572 s  <<< ERROR!
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.suref

[jira] [Commented] (HDFS-14751) TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk

2019-09-01 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920574#comment-16920574
 ] 

Sean Chow commented on HDFS-14751:
--

Shall we just synchronize {{diffs}

{code:java}
  void reconcile() throws IOException {
scan();
synchronized(diffs) {
  for (Entry> entry : diffs.entrySet()) {
String bpid = entry.getKey();
LinkedList diff = entry.getValue();

for (ScanInfo info : diff) {
  dataset.checkAndUpdate(bpid, info.getBlockId(), info.getBlockFile(),
  info.getMetaFile(), info.getVolume());
}
  }
}
if (!retainDiffs) clear();
  }
{code}

And in {{scan()}}:
{code:java}
// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {
synchronized(diffs) {...}
}
{code}


> TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk
> -
>
> Key: HDFS-14751
> URL: https://issues.apache.org/jira/browse/HDFS-14751
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14751.001.patch
>
>
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 21.693 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
> [ERROR] 
> testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency)
>   Time elapsed: 7.572 s  <<< ERROR!
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdf

[jira] [Comment Edited] (HDFS-14751) TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk

2019-08-21 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912985#comment-16912985
 ] 

Sean Chow edited comment on HDFS-14751 at 8/22/19 5:30 AM:
---

[~leosun08] Note that {{reconcile()}} will call {{scan()}} method that may take 
hours to finish. 

Adding {{synchronized}} to {{reconcile()}} will make it a long held lock. It 
could be better that synchronize the codeblock rather than method. Or You just 
want to make sure there is only one scanning processing?


was (Author: seanlook):
[~leosun08] Note that {{reconcile()}} will call {{scan()}} method that may take 
hours to finish. Adding {{synchronized}} to {{reconcile()}} will make it a long 
held lock. It could be better that synchronize the codeblock rather than method.

> TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk
> -
>
> Key: HDFS-14751
> URL: https://issues.apache.org/jira/browse/HDFS-14751
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14751.001.patch
>
>
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 21.693 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
> [ERROR] 
> testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency)
>   Time elapsed: 7.572 s  <<< ERROR!
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]



--
This messag

[jira] [Comment Edited] (HDFS-14751) TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk

2019-08-21 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912985#comment-16912985
 ] 

Sean Chow edited comment on HDFS-14751 at 8/22/19 5:26 AM:
---

[~leosun08] Note that {{reconcile()}} will call {{scan()}} method that may take 
hours to finish. Adding {{synchronized}} to {{reconcile()}} will make it a long 
held lock. It could be better that synchronize the codeblock rather than method.


was (Author: seanlook):
Note that {{reconcile()}} will call {{scan()}} method that may take hours to 
finish. Adding {{synchronized}} to {{reconcile()}} will make it a long held 
lock. It could be better that synchronize the codeblock rather than method.

> TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk
> -
>
> Key: HDFS-14751
> URL: https://issues.apache.org/jira/browse/HDFS-14751
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14751.001.patch
>
>
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 21.693 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
> [ERROR] 
> testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency)
>   Time elapsed: 7.572 s  <<< ERROR!
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

--

[jira] [Commented] (HDFS-14751) TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk

2019-08-21 Thread Sean Chow (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912985#comment-16912985
 ] 

Sean Chow commented on HDFS-14751:
--

Note that {{reconcile()}} will call {{scan()}} method that may take hours to 
finish. Adding {{synchronized}} to {{reconcile()}} will make it a long held 
lock. It could be better that synchronize the codeblock rather than method.

> TestNameNodeMetadataConsistency#testGenerationStampInFuture fail in trunk
> -
>
> Key: HDFS-14751
> URL: https://issues.apache.org/jira/browse/HDFS-14751
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14751.001.patch
>
>
> {code:java}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 21.693 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
> [ERROR] 
> testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency)
>   Time elapsed: 7.572 s  <<< ERROR!
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}
> Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-15 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908781#comment-16908781
 ] 

Sean Chow commented on HDFS-14476:
--

Hi [~jojochuang] , is that patch ok? And what's the next step to have it merged 
to the main trunk?
Thanks.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: (was: HDFS-14476.01.patch)

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: (was: HDFS-14476-branch-2.01.patch)

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Affects Version/s: 3.0.3
   Attachment: HDFS-14476-branch-2.01.patch
   HDFS-14476.01.patch
   Status: Patch Available  (was: Open)

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.3, 2.7.0, 2.6.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, HDFS-14476.01.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901864#comment-16901864
 ] 

Sean Chow commented on HDFS-14476:
--

patch updated!

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: HDFS-14476.01.patch

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch, HDFS-14476.01.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-07 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: HDFS-14476-branch-2.01.patch

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, 
> HDFS-14476.01.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-05 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900255#comment-16900255
 ] 

Sean Chow commented on HDFS-14476:
--

Thanks for reviewing. The already added patch works for banch2.6~branch2.9, so 
I will add a rebased patch tomorrow.

By the way, do I need to add this same patch to branch-3? I see the 
{{reconcile}} is a little diffrent.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-08-05 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900255#comment-16900255
 ] 

Sean Chow edited comment on HDFS-14476 at 8/5/19 5:26 PM:
--

Thanks for reviewing. The already added patch works for banch2.6~branch2.9, so 
I will add a rebased patch tomorrow.

By the way, do I need to add this same patch to branch-3? I see the 
{{reconcile}} is a little different.


was (Author: seanlook):
Thanks for reviewing. The already added patch works for banch2.6~branch2.9, so 
I will add a rebased patch tomorrow.

By the way, do I need to add this same patch to branch-3? I see the 
{{reconcile}} is a little diffrent.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-07-08 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880466#comment-16880466
 ] 

Sean Chow commented on HDFS-14536:
--

This patch remove {{ynchronized(dataset)}} from {{scan()}}, and it doesn't 
broke the inconsistent check and update result.

With synchronized:
 !reconcile-with-synchronized.png! 

After removing the synchronized:
 !reconcile-with-removing-synchronized.png! 

> It's safe to remove synchronized from reconcile scan
> 
>
> Key: HDFS-14536
> URL: https://issues.apache.org/jira/browse/HDFS-14536
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14536.00.patch, 
> reconcile-with-removing-synchronized.png, reconcile-with-synchronized.png
>
>
> I found all my my request blocked in every 6 hours and use jstack to dump 
> threads:
> {code:java}
> "1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
> tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry 
> [0x7f7e9e42b000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
>  - waiting to lock <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)
> /// lots of BLOCKED for about one minute
> "java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
> queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc 
> runnable [0x7f7ea0344000]
>   java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
> at 
> org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
> - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
>     - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> This problem may relate to lots of blocks in every datanode. 
> After reviewing the DirectoryScanner behavior, I think the lock 
> {{synchronized(dataset)}} in  {{scan()}} can be removed safely.
> As dataset.getFinalizedBlocks() is a synchronized method, the memReport will 
> be generated from volumeMap with lock. After that , the outside lock 
> {{synchronized(dataset)}} is needless but I've encounter the later while-loop 
> running 20 seconds that blocking other threads.
> Yes, other blocks could be added or deleted from memory or disk, but it has 
> no affect to get the differences between them. Because the later fixing 
> processing will handle both sides.
>  
> {code:java}
>   void scan() {
> clear();
> Map diskReport = getDiskReport();
> // Hold FSDataset lock to prevent further changes to the block map
> synchronized(dataset) {  // remove synchronized?
>   for (Entry entry : diskReport.entrySet()) {
> String bpid = entry.getKey();
> ScanInfo[] blockpoolReport = entry.getValue();
> Stats statsRecord = new Stats(bpid);
> stats.put(bpid, statsRecord);
> LinkedList diffRecord = new LinkedList();
> diffs.put(bpid, diffRecord);
> statsRecord.totalBlocks = blockpoolReport.length;
> // getFinalizedBlocks is synchronized 
> List bl = dataset.getF

[jira] [Updated] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-07-08 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-
Attachment: reconcile-with-removing-synchronized.png

> It's safe to remove synchronized from reconcile scan
> 
>
> Key: HDFS-14536
> URL: https://issues.apache.org/jira/browse/HDFS-14536
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14536.00.patch, 
> reconcile-with-removing-synchronized.png, reconcile-with-synchronized.png
>
>
> I found all my my request blocked in every 6 hours and use jstack to dump 
> threads:
> {code:java}
> "1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
> tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry 
> [0x7f7e9e42b000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
>  - waiting to lock <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)
> /// lots of BLOCKED for about one minute
> "java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
> queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc 
> runnable [0x7f7ea0344000]
>   java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
> at 
> org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
> - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
>     - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> This problem may relate to lots of blocks in every datanode. 
> After reviewing the DirectoryScanner behavior, I think the lock 
> {{synchronized(dataset)}} in  {{scan()}} can be removed safely.
> As dataset.getFinalizedBlocks() is a synchronized method, the memReport will 
> be generated from volumeMap with lock. After that , the outside lock 
> {{synchronized(dataset)}} is needless but I've encounter the later while-loop 
> running 20 seconds that blocking other threads.
> Yes, other blocks could be added or deleted from memory or disk, but it has 
> no affect to get the differences between them. Because the later fixing 
> processing will handle both sides.
>  
> {code:java}
>   void scan() {
> clear();
> Map diskReport = getDiskReport();
> // Hold FSDataset lock to prevent further changes to the block map
> synchronized(dataset) {  // remove synchronized?
>   for (Entry entry : diskReport.entrySet()) {
> String bpid = entry.getKey();
> ScanInfo[] blockpoolReport = entry.getValue();
> Stats statsRecord = new Stats(bpid);
> stats.put(bpid, statsRecord);
> LinkedList diffRecord = new LinkedList();
> diffs.put(bpid, diffRecord);
> statsRecord.totalBlocks = blockpoolReport.length;
> // getFinalizedBlocks is synchronized 
> List bl = dataset.getFinalizedBlocks(bpid);
> FinalizedReplica[] memReport = bl.toArray(new 
> FinalizedReplica[bl.size()]);
> Arrays.sort(memReport); // Sort based on blockId
> int d = 0; // index for blockpoolReport
> int m = 0; // index for memRep

[jira] [Updated] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-07-08 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-
Attachment: reconcile-with-synchronized.png

> It's safe to remove synchronized from reconcile scan
> 
>
> Key: HDFS-14536
> URL: https://issues.apache.org/jira/browse/HDFS-14536
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14536.00.patch, 
> reconcile-with-removing-synchronized.png, reconcile-with-synchronized.png
>
>
> I found all my my request blocked in every 6 hours and use jstack to dump 
> threads:
> {code:java}
> "1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
> tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry 
> [0x7f7e9e42b000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
>  - waiting to lock <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)
> /// lots of BLOCKED for about one minute
> "java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
> queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc 
> runnable [0x7f7ea0344000]
>   java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
> at 
> org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
> - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
>     - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> This problem may relate to lots of blocks in every datanode. 
> After reviewing the DirectoryScanner behavior, I think the lock 
> {{synchronized(dataset)}} in  {{scan()}} can be removed safely.
> As dataset.getFinalizedBlocks() is a synchronized method, the memReport will 
> be generated from volumeMap with lock. After that , the outside lock 
> {{synchronized(dataset)}} is needless but I've encounter the later while-loop 
> running 20 seconds that blocking other threads.
> Yes, other blocks could be added or deleted from memory or disk, but it has 
> no affect to get the differences between them. Because the later fixing 
> processing will handle both sides.
>  
> {code:java}
>   void scan() {
> clear();
> Map diskReport = getDiskReport();
> // Hold FSDataset lock to prevent further changes to the block map
> synchronized(dataset) {  // remove synchronized?
>   for (Entry entry : diskReport.entrySet()) {
> String bpid = entry.getKey();
> ScanInfo[] blockpoolReport = entry.getValue();
> Stats statsRecord = new Stats(bpid);
> stats.put(bpid, statsRecord);
> LinkedList diffRecord = new LinkedList();
> diffs.put(bpid, diffRecord);
> statsRecord.totalBlocks = blockpoolReport.length;
> // getFinalizedBlocks is synchronized 
> List bl = dataset.getFinalizedBlocks(bpid);
> FinalizedReplica[] memReport = bl.toArray(new 
> FinalizedReplica[bl.size()]);
> Arrays.sort(memReport); // Sort based on blockId
> int d = 0; // index for blockpoolReport
> int m = 0; // index for memReprot
>

[jira] [Updated] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-07-08 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-
Description: 
I found all my my request blocked in every 6 hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x7f7e9e42b000]
  java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
 - waiting to lock <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)

/// lots of BLOCKED for about one minute

"java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc runnable 
[0x7f7ea0344000]
  java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
- locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
    - locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}

This problem may relate to lots of blocks in every datanode. 
After reviewing the DirectoryScanner behavior, I think the lock 
{{synchronized(dataset)}} in  {{scan()}} can be removed safely.

As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be 
generated from volumeMap with lock. After that , the outside lock 
{{synchronized(dataset)}} is needless but I've encounter the later while-loop 
running 20 seconds that blocking other threads.

Yes, other blocks could be added or deleted from memory or disk, but it has no 
affect to get the differences between them. Because the later fixing processing 
will handle both sides.

 
{code:java}
  void scan() {
clear();
Map diskReport = getDiskReport();

// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {  // remove synchronized?
  for (Entry entry : diskReport.entrySet()) {
String bpid = entry.getKey();
ScanInfo[] blockpoolReport = entry.getValue();

Stats statsRecord = new Stats(bpid);
stats.put(bpid, statsRecord);
LinkedList diffRecord = new LinkedList();
diffs.put(bpid, diffRecord);

statsRecord.totalBlocks = blockpoolReport.length;
// getFinalizedBlocks is synchronized 
List bl = dataset.getFinalizedBlocks(bpid);
FinalizedReplica[] memReport = bl.toArray(new 
FinalizedReplica[bl.size()]);
Arrays.sort(memReport); // Sort based on blockId

int d = 0; // index for blockpoolReport
int m = 0; // index for memReprot
while (m < memReport.length && d < blockpoolReport.length) {
  // may take 20s to finish this loop
}
  } //end for
} //end synchronized
  }{code}
 

Let say block A is added and block B is deleted after 
{{dataset.getFinalizedBlocks(bpid)}} called. That means the disk  file and 
memory is updated. Because we have a memory finalized blocks snapshot before, 
and just care about those blocks consistence. 

If the difference report occasionally has that deleted block B, the later 
{{checkAndUpdate}} will handle it in the right way.


I've compile this patch and try will test it in my enviroment.

  was:
I found all my my request blocked in every 6 hours and use jstack to dump 
threads:
{code:java}

[jira] [Updated] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-07-08 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-
Attachment: HDFS-14536.00.patch

> It's safe to remove synchronized from reconcile scan
> 
>
> Key: HDFS-14536
> URL: https://issues.apache.org/jira/browse/HDFS-14536
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14536.00.patch
>
>
> I found all my my request blocked in every 6 hours and use jstack to dump 
> threads:
> {code:java}
> "1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
> tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry 
> [0x7f7e9e42b000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
>  - waiting to lock <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)
> /// lots of BLOCKED for about one minute
> "java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
> queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc 
> runnable [0x7f7ea0344000]
>   java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
> at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
> at 
> org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
> - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
>     - locked <0x0002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> This problem may relate to lots of blocks in every datanode. 
> After reviewing the DirectoryScanner behavior, I think the lock 
> {{synchronized(dataset)}} in  {{scan()}} can be removed safely.
> As dataset.getFinalizedBlocks() is a synchronized method, the memReport will 
> be generated from volumeMap with lock. After that , the outside lock 
> {{synchronized(dataset)}} is needless but I've encounter the later while loop 
> running 20 seconds
> Yes, other blocks could be added or deleted from memory or disk, but it has 
> no affect to get the differences between them. Because the later fixing 
> processing will handle both sides.
>  
> {code:java}
>   void scan() {
> clear();
> Map diskReport = getDiskReport();
> // Hold FSDataset lock to prevent further changes to the block map
> synchronized(dataset) {  // remove synchronized?
>   for (Entry entry : diskReport.entrySet()) {
> String bpid = entry.getKey();
> ScanInfo[] blockpoolReport = entry.getValue();
> Stats statsRecord = new Stats(bpid);
> stats.put(bpid, statsRecord);
> LinkedList diffRecord = new LinkedList();
> diffs.put(bpid, diffRecord);
> statsRecord.totalBlocks = blockpoolReport.length;
> // getFinalizedBlocks is synchronized 
> List bl = dataset.getFinalizedBlocks(bpid);
> FinalizedReplica[] memReport = bl.toArray(new 
> FinalizedReplica[bl.size()]);
> Arrays.sort(memReport); // Sort based on blockId
> int d = 0; // index for blockpoolReport
> int m = 0; // index for memReprot
> while (m < memReport.length && d < blockpoolReport.length) {
>   // may take 20s to finish this loop
>

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-07-08 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880426#comment-16880426
 ] 

Sean Chow commented on HDFS-14476:
--

I've add a patch here to process inconsistent blocks in map with fixed batch 
size 500. And It did work well in my production.
 !datanode-with-patch-14476.png! 

You can see there's a pike periodically before applying this patch. This is the 
jstack output for that spike time:

{code:java}
"DataXceiver for client DFSClient_NONMAPREDUCE_-2048576664_27 at /x.x.x.x:49793 
[Sending block BP-1644920766-x.x.x.x-1450099987967:blk_2423788189_1403044486]" 
daemon prio=10 tid=0x7f8a0ea3d800 nid=0x19a2 waiting for monitor entry 
[0x7f89982a2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:231)
- waiting to lock <0x0004001d6348> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243)
at java.lang.Thread.run(Thread.java:745)

"369604090@qtp-1069118552-20368" daemon prio=10 tid=0x01f94000 
nid=0x19a1 waiting for monitor entry [0x7f89c0d41000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
- waiting to lock <0x0004001d6348> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:334)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getRemaining(FsVolumeList.java:162)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getRemaining(FsDatasetImpl.java:496)
- locked <0x000400502b70> (a java.lang.Object)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)

"java.util.concurrent.ThreadPoolExecutor$Worker@704adcff[State = -1, empty 
queue]" daemon prio=10 tid=0x7f8a000f3000 nid=0x59f1 runnable 
[0x7f89c9c66000]
   java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.exists(File.java:813)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1965)
- locked <0x0004001d6348> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:410)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

{code}

All threads are blocked until timeout returned when the directory scanner do 
check&update blocks. 

Also we know that the scan result {{missing metadata files:23574, missing block 
files:23574, missing blocks in memory:47625}} is not so correct, because the 
scan process may take two hours to finish and the already scanned directories 
may have new files that will be added to inconsistent blocks map.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the result

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-07-08 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: datanode-with-patch-14476.png

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch, datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-07-08 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Attachment: HDFS-14476.00.patch

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
> Attachments: HDFS-14476.00.patch
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-06-03 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-
Description: 
I found all my my request blocked in every 6hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x7f7e9e42b000]
  java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
 - waiting to lock <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)

/// lots of BLOCKED for about one minute

"java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc runnable 
[0x7f7ea0344000]
  java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
- locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
    - locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}
After reviewing the DirectoryScanner behavior, I think the lock 
{{synchronized(dataset)}} in  {{scan()}} can be removed safely.

 

As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be 
generated from volumeMap with lock. After that , the outside lock 
{{synchronized(dataset)}} is needless but I've encounter the later while loop 
running 20 seconds

Yes, other blocks could be added or deleted from memory or disk, but it has no 
affect to get the differences between them. Because the later fixing processing 
will handle both sides.

 

 
{code:java}
  void scan() {
clear();
Map diskReport = getDiskReport();

// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {  // remove synchronized?
  for (Entry entry : diskReport.entrySet()) {
String bpid = entry.getKey();
ScanInfo[] blockpoolReport = entry.getValue();

Stats statsRecord = new Stats(bpid);
stats.put(bpid, statsRecord);
LinkedList diffRecord = new LinkedList();
diffs.put(bpid, diffRecord);

statsRecord.totalBlocks = blockpoolReport.length;
// getFinalizedBlocks is synchronized 
List bl = dataset.getFinalizedBlocks(bpid);
FinalizedReplica[] memReport = bl.toArray(new 
FinalizedReplica[bl.size()]);
Arrays.sort(memReport); // Sort based on blockId

int d = 0; // index for blockpoolReport
int m = 0; // index for memReprot
while (m < memReport.length && d < blockpoolReport.length) {
  // may take 20s to finish this loop
}
  } //end for
} //end synchronized
  }{code}
 

Let say block A is added and block B is deleted after 
{{dataset.getFinalizedBlocks(bpid)}} called. That means the disk  file and 
memory is updated. Because we have a memory finalized blocks snapshot before, 
and just care about those blocks consistence. 

If the difference report occasionally has that deleted block B, the later 
{{checkAndUpdate}} will handle it in the right way.

 

I've compile this patch and try will test it in my enviroment.

  was:
I found all my my request blocked in every 6hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x7f7ed01c3

[jira] [Updated] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-06-03 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-
Description: 
I found all my my request blocked in every 6 hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x7f7e9e42b000]
  java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
 - waiting to lock <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)

/// lots of BLOCKED for about one minute

"java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc runnable 
[0x7f7ea0344000]
  java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
- locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
    - locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}

This problem may relate to lots of blocks in every datanode. 
After reviewing the DirectoryScanner behavior, I think the lock 
{{synchronized(dataset)}} in  {{scan()}} can be removed safely.

As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be 
generated from volumeMap with lock. After that , the outside lock 
{{synchronized(dataset)}} is needless but I've encounter the later while loop 
running 20 seconds

Yes, other blocks could be added or deleted from memory or disk, but it has no 
affect to get the differences between them. Because the later fixing processing 
will handle both sides.

 
{code:java}
  void scan() {
clear();
Map diskReport = getDiskReport();

// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {  // remove synchronized?
  for (Entry entry : diskReport.entrySet()) {
String bpid = entry.getKey();
ScanInfo[] blockpoolReport = entry.getValue();

Stats statsRecord = new Stats(bpid);
stats.put(bpid, statsRecord);
LinkedList diffRecord = new LinkedList();
diffs.put(bpid, diffRecord);

statsRecord.totalBlocks = blockpoolReport.length;
// getFinalizedBlocks is synchronized 
List bl = dataset.getFinalizedBlocks(bpid);
FinalizedReplica[] memReport = bl.toArray(new 
FinalizedReplica[bl.size()]);
Arrays.sort(memReport); // Sort based on blockId

int d = 0; // index for blockpoolReport
int m = 0; // index for memReprot
while (m < memReport.length && d < blockpoolReport.length) {
  // may take 20s to finish this loop
}
  } //end for
} //end synchronized
  }{code}
 

Let say block A is added and block B is deleted after 
{{dataset.getFinalizedBlocks(bpid)}} called. That means the disk  file and 
memory is updated. Because we have a memory finalized blocks snapshot before, 
and just care about those blocks consistence. 

If the difference report occasionally has that deleted block B, the later 
{{checkAndUpdate}} will handle it in the right way.


I've compile this patch and try will test it in my enviroment.

  was:
I found all my my request blocked in every 6hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35"

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-06-03 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Description: 
When directoryScanner have the results of differences between disk and 
in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
{{FsDatasetImpl.checkAndUpdate}} is a synchronized call

As I have about 6millions blocks for every datanodes and every 6hours' scan 
will have about 25000 abnormal blocks to fix. That leads to a long lock holding 
FsDatasetImpl object.

let's assume every block need 10ms to fix(because of latency of SAS disk), that 
will cost 250 seconds to finish. That means all reads and writes will be 
blocked for 3mins for that datanode.

 
{code:java}
2019-05-06 08:06:51,704 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
metadata files:23574, missing block files:23574, missing blocks in 
memory:47625, mismatched blocks:0

...

2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 588402ms to process 1 commands from NN
{code}
Take long time to process command from nn because threads are blocked. And 
namenode will see long lastContact time for this datanode.

Maybe this affect all hdfs versions.

*how to fix:*

just like process invalidate command from namenode with 1000 batch size, fix 
these abnormal block should be handled with batch too and sleep 2 seconds 
between the batch to allow normal reading/writing blocks.

  was:
When directoryScanner have the results of differences between disk and 
in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
`FsDatasetImpl.checkAndUpdate` is a synchronized call

As I have about 6millions blocks for every datanodes and every 6hours' scan 
will have about 25000 abnormal blocks to fix. That leads to a long lock holding 
FsDatasetImpl object.

let's assume every block need 10ms to fix(because of latency of SAS disk), that 
will cost 250 seconds to finish. That means all reads and writes will be 
blocked for 3mins for that datanode.

 
{code:java}
2019-05-06 08:06:51,704 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
metadata files:23574, missing block files:23574, missing blocks in 
memory:47625, mismatched blocks:0

...

2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 588402ms to process 1 commands from NN
{code}
Take long time to process command from nn because threads are blocked. And 
namenode will see long lastContact time for this datanode.

Maybe this affect all hdfs versions.

*to fix:*

just like process invalidate command from namenode with 1000 batch size, fix 
these abnormal block should be handled with batch too and sleep 2 seconds 
between the batch to allow normal reading/writing blocks.


> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-06-03 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854448#comment-16854448
 ] 

Sean Chow commented on HDFS-14476:
--

[~jojochuang] I've found out why _every 6hours' scan will have about 25000 
abnormal blocks to fix_.

Because the directory scan and get memory finalized blocks is asynchronously. 
The actual disk scan may take one hour to finish, between that time period 
other blocks is added or removed.

But the long held lock  mentioned in this issue still exists. I've this patch 
my own and will test it.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
> `FsDatasetImpl.checkAndUpdate` is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14536) It's safe to remove synchronized from reconcile scan

2019-06-03 Thread Sean Chow (JIRA)

Sean Chow created HDFS-14536:


 Summary: It's safe to remove synchronized from reconcile scan
 Key: HDFS-14536
 URL: https://issues.apache.org/jira/browse/HDFS-14536
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.7.0, 2.6.0
Reporter: Sean Chow


I found all my my request blocked in every 6hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x7f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x7f7e9e42b000]
  java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
 - waiting to lock <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)

/// lots of BLOCKED for about one minute

"java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
queue]" #77 daemon prio=5 os_prio=0 tid=0x7f7ed8590800 nid=0x49cc runnable 
[0x7f7ea0344000]
  java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.(ReplicaInfo.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.(FinalizedReplica.java:60)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
- locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
    - locked <0x0002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}
After reviewing the DirectoryScanner behavior, I think the lock 
`synchronized(dataset) in ` `scan()` can be removed safely.

 

As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be 
generated from volumeMap with lock. After that , the outside lock 
`synchronized(dataset)` is needless but I've encounter the later while loop 
running 20 seconds

Yes, other blocks could be added or deleted from memory or disk, but it has no 
affect to get the differences between them. Because the later fixing processing 
will handle both sides.

 

 
{code:java}
  void scan() {
clear();
Map diskReport = getDiskReport();

// Hold FSDataset lock to prevent further changes to the block map
synchronized(dataset) {  // remove synchronized?
  for (Entry entry : diskReport.entrySet()) {
String bpid = entry.getKey();
ScanInfo[] blockpoolReport = entry.getValue();

Stats statsRecord = new Stats(bpid);
stats.put(bpid, statsRecord);
LinkedList diffRecord = new LinkedList();
diffs.put(bpid, diffRecord);

statsRecord.totalBlocks = blockpoolReport.length;
// getFinalizedBlocks is synchronized 
List bl = dataset.getFinalizedBlocks(bpid);
FinalizedReplica[] memReport = bl.toArray(new 
FinalizedReplica[bl.size()]);
Arrays.sort(memReport); // Sort based on blockId

int d = 0; // index for blockpoolReport
int m = 0; // index for memReprot
while (m < memReport.length && d < blockpoolReport.length) {
  // may take 20s to finish this loop
}
  } //end for
} //end synchronized
  }{code}
 

Let say block A is added and block B is deleted after 
`dataset.getFinalizedBlocks(bpid)` called. That means the disk  file and memory 
is updated. Because we have a memory finalized blocks snapshot before, and just 
care about those blocks consistence. 

If the difference report occasionally has that deleted block B, the later 
`checkAndUpdate` will handle it in the right way.

 

I've compile this patch and try will test it in my enviro

[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-05-07 Thread Sean Chow (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835265#comment-16835265
 ] 

Sean Chow commented on HDFS-14476:
--

[~jojochuang] I'm sure my disk status is ok. I have more than 60 datanodes that 
directory scan have that lots of abnormal blocks to fix.

I think that because of lots of small files. In my another normal cluster, 
directory scan usually have 50-100 abnormal blocks. That won't be a problem to 
fix it.

 

like [~starphin] said, maybe 6millions blocks is too much for every datanodes 
that raises the amount of abnormal blocks.

 

whatever, the long held lock to checkAndUpdate shall be avoided to allow 
processing read/write requests.

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
> `FsDatasetImpl.checkAndUpdate` is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-05-06 Thread Sean Chow (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14476:
-
Description: 
When directoryScanner have the results of differences between disk and 
in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
`FsDatasetImpl.checkAndUpdate` is a synchronized call

As I have about 6millions blocks for every datanodes and every 6hours' scan 
will have about 25000 abnormal blocks to fix. That leads to a long lock holding 
FsDatasetImpl object.

let's assume every block need 10ms to fix(because of latency of SAS disk), that 
will cost 250 seconds to finish. That means all reads and writes will be 
blocked for 3mins for that datanode.

 
{code:java}
2019-05-06 08:06:51,704 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
metadata files:23574, missing block files:23574, missing blocks in 
memory:47625, mismatched blocks:0

...

2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 588402ms to process 1 commands from NN
{code}
Take long time to process command from nn because threads are blocked. And 
namenode will see long lastContact time for this datanode.

Maybe this affect all hdfs versions.

*to fix:*

just like process invalidate command from namenode with 1000 batch size, fix 
these abnormal block should be handled with batch too and sleep 2 seconds 
between the batch to allow normal reading/writing blocks.

  was:
When directoryScanner have the results of differences between disk and 
in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
`FsDatasetImpl.checkAndUpdate` is a synchronized call

As I have about 6millions blocks for every datanodes and every 6hours' scan 
will have about 25000 abnormal blocks to fix. That leads to a long lock holding 
FsDatasetImpl object.

let's assume every block need 10ms to fix(because of latency of SAS disk), that 
will cost 250 seconds to finish. That means all reads and writes will be 
blocked for 3mins for that datanode.

 

```

2019-05-06 08:06:51,704 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
metadata files:23574, missing block files:23574, missing blocks in 
memory:47625, mismatched blocks:0

...

2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 588402ms to process 1 commands from NN

```

Take long time to process command from nn because threads are blocked. And 
namenode will see long lastContact time for this datanode.

Maybe this affect all hdfs versions.

to fix:

just like process invalidate command from namenode with 1000 batch size, fix 
these abnormal block should be handled with batch too and sleep 2 seconds 
between the batch to allow normal reading/writing blocks.


> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Sean Chow
>Priority: Major
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
> `FsDatasetImpl.checkAndUpdate` is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-

[jira] [Created] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2019-05-06 Thread Sean Chow (JIRA)

Sean Chow created HDFS-14476:


 Summary: lock too long when fix inconsistent blocks between disk 
and in-memory
 Key: HDFS-14476
 URL: https://issues.apache.org/jira/browse/HDFS-14476
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.7.0, 2.6.0
Reporter: Sean Chow


When directoryScanner have the results of differences between disk and 
in-memory blocks. it will try to run `checkAndUpdate` to fix it. However 
`FsDatasetImpl.checkAndUpdate` is a synchronized call

As I have about 6millions blocks for every datanodes and every 6hours' scan 
will have about 25000 abnormal blocks to fix. That leads to a long lock holding 
FsDatasetImpl object.

let's assume every block need 10ms to fix(because of latency of SAS disk), that 
will cost 250 seconds to finish. That means all reads and writes will be 
blocked for 3mins for that datanode.

 

```

2019-05-06 08:06:51,704 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
metadata files:23574, missing block files:23574, missing blocks in 
memory:47625, mismatched blocks:0

...

2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Took 588402ms to process 1 commands from NN

```

Take long time to process command from nn because threads are blocked. And 
namenode will see long lastContact time for this datanode.

Maybe this affect all hdfs versions.

to fix:

just like process invalidate command from namenode with 1000 batch size, fix 
these abnormal block should be handled with batch too and sleep 2 seconds 
between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

58 matches

Mail list logo