[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

2020-02-20 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041574#comment-17041574
 ] 

zhuqi commented on HDFS-15171:
--

Hi [~weichiu] 
There are no cache file if the datanode shutdow ungracefully , change the 
dfs.datanode.cached-dfsused.check.interval.ms will not help my case.

The HDFS-14313  should can reduce the refresh time, i will try it.

Thanks.

> Add a thread to call saveDfsUsed periodically, to prevent datanode too long 
> restart time.  
> ---
>
> Key: HDFS-15171
> URL: https://issues.apache.org/jira/browse/HDFS-15171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> There are 30 storage dirs per datanode in our production cluster , it will 
> take too many time to restart, because sometimes the datanode didn't shutdown 
> gracefully. Now only the datanode graceful shut down hook and the 
> blockpoolslice shutdown will cause the saveDfsUsed function, that cause the 
> restart of datanode can't reuse the dfsuse cache sometimes. I think if we can 
> add a thread to periodically call the saveDfsUsed function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

2020-02-20 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041564#comment-17041564
 ] 

zhuqi commented on HDFS-15171:
--

Hi [~sodonnell]

Thanks for your patient reply.

First, the every 10 minutes thread in CachingGetSpaceUsed, now with a random 
jitter time to random the refresh operation, and if we can persist the value to 
the cache file when the value refresh, this is the most real time cache.

Second, when the value refresh, we can compare it with last one, if they are 
same, we can jump the persist operation to reduce the disk operation.

In order to reduce the disk operation, we can add a fixed time interval which 
can be configurated, when the real time fresh time exceed the fixed time 
interval , then to persist the value to disk.

Then we can remove the shutdown hook persist operation and don't need to 
caculate what dfs.datanode.cached-dfsused.check.interval.ms is suitable 
anymore. 

And also can reslove my problem, which caused by the datanode shutdown 
ungracefully. 

What do you think about my advice?

> Add a thread to call saveDfsUsed periodically, to prevent datanode too long 
> restart time.  
> ---
>
> Key: HDFS-15171
> URL: https://issues.apache.org/jira/browse/HDFS-15171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> There are 30 storage dirs per datanode in our production cluster , it will 
> take too many time to restart, because sometimes the datanode didn't shutdown 
> gracefully. Now only the datanode graceful shut down hook and the 
> blockpoolslice shutdown will cause the saveDfsUsed function, that cause the 
> restart of datanode can't reuse the dfsuse cache sometimes. I think if we can 
> add a thread to periodically call the saveDfsUsed function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

2020-02-20 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041367#comment-17041367
 ] 

Wei-Chiu Chuang commented on HDFS-15171:


The first thing coming to my mind is increasing 
dfs.datanode.cached-dfsused.check.interval.ms like Stephen said.
Otherwise, HDFS-14313 may be useful too.

> Add a thread to call saveDfsUsed periodically, to prevent datanode too long 
> restart time.  
> ---
>
> Key: HDFS-15171
> URL: https://issues.apache.org/jira/browse/HDFS-15171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> There are 30 storage dirs per datanode in our production cluster , it will 
> take too many time to restart, because sometimes the datanode didn't shutdown 
> gracefully. Now only the datanode graceful shut down hook and the 
> blockpoolslice shutdown will cause the saveDfsUsed function, that cause the 
> restart of datanode can't reuse the dfsuse cache sometimes. I think if we can 
> add a thread to periodically call the saveDfsUsed function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

2020-02-18 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039018#comment-17039018
 ] 

Stephen O'Donnell commented on HDFS-15171:
--

[~zhuqi] There are a few parts to this.

First, the disk usage is refreshed every 10 minutes by a thread in 
CachingGetSpaceUsed, however I did not realise it did not persist the newly 
calculated value to the cache file. You are correct, as it does not, and the 
shutdown hook does it. It also could take some time to complete if the disk is 
large.

I wonder if we could use the existing refresh thread in CachingGetSpaceUsed, 
perhaps by passing a callback (or something like the existing shutdown hook) so 
it can save the cache file each time it runs?

The next problem is that in BlockPoolSlice.loadDfsUsed(), it will only load the 
cache file if the mtime on the file is less than 
"dfs.datanode.cached-dfsused.check.interval.ms" old. This defaults to 10 
minutes. This results in another problem:

If you shutdown a DN for more than 10 minutes, even if it shutdown cleanly and 
saved the cache file, it will not read the cache file on startup as the file is 
over 10 minutes old.

Saying as the disk usage will be refreshed within 10 minutes of the DN 
starting, I think 10 minutes for dfs.datanode.cached-dfsused.check.interval.ms 
is probably too small and that default could do with being a bit higher.

If we could ensure the cache file was saved approximately every 10 minutes by 
the refresh thread, you could argue that the DN should always use the cache 
file if it is there, as it should be reasonably up to date anyway.

I encountered a problem like this recently with a few DNs which were taking a 
very long time to startup, and we worked around it by adjusting the mtime on 
the cache file to get them started. This was OK for a one off, but perhaps we 
can do better in this Jira.



> Add a thread to call saveDfsUsed periodically, to prevent datanode too long 
> restart time.  
> ---
>
> Key: HDFS-15171
> URL: https://issues.apache.org/jira/browse/HDFS-15171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> There are 30 storage dirs per datanode in our production cluster , it will 
> take too many time to restart, because sometimes the datanode didn't shutdown 
> gracefully. Now only the datanode graceful shut down hook and the 
> blockpoolslice shutdown will cause the saveDfsUsed function, that cause the 
> restart of datanode can't reuse the dfsuse cache sometimes. I think if we can 
> add a thread to periodically call the saveDfsUsed function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.

2020-02-16 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038090#comment-17038090
 ] 

zhuqi commented on HDFS-15171:
--

cc [~linyiqun], [~weichiu] , [~hexiaoqiao] 
What do you think about this problem? If you can give me some advice.

Thanks.

> Add a thread to call saveDfsUsed periodically, to prevent datanode too long 
> restart time.  
> ---
>
> Key: HDFS-15171
> URL: https://issues.apache.org/jira/browse/HDFS-15171
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Assignee: zhuqi
>Priority: Major
>
> There are 30 storage dirs per datanode in our production cluster , it will 
> take too many time to restart, because sometimes the datanode didn't shutdown 
> gracefully. Now only the datanode graceful shut down hook and the 
> blockpoolslice shutdown will cause the saveDfsUsed function, that cause the 
> restart of datanode can't reuse the dfsuse cache sometimes. I think if we can 
> add a thread to periodically call the saveDfsUsed function.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org