[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041574#comment-17041574 ] zhuqi commented on HDFS-15171: -- Hi [~weichiu] There are no cache file if the datanode shutdow ungracefully , change the dfs.datanode.cached-dfsused.check.interval.ms will not help my case. The HDFS-14313 should can reduce the refresh time, i will try it. Thanks. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041564#comment-17041564 ] zhuqi commented on HDFS-15171: -- Hi [~sodonnell] Thanks for your patient reply. First, the every 10 minutes thread in CachingGetSpaceUsed, now with a random jitter time to random the refresh operation, and if we can persist the value to the cache file when the value refresh, this is the most real time cache. Second, when the value refresh, we can compare it with last one, if they are same, we can jump the persist operation to reduce the disk operation. In order to reduce the disk operation, we can add a fixed time interval which can be configurated, when the real time fresh time exceed the fixed time interval , then to persist the value to disk. Then we can remove the shutdown hook persist operation and don't need to caculate what dfs.datanode.cached-dfsused.check.interval.ms is suitable anymore. And also can reslove my problem, which caused by the datanode shutdown ungracefully. What do you think about my advice? > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041367#comment-17041367 ] Wei-Chiu Chuang commented on HDFS-15171: The first thing coming to my mind is increasing dfs.datanode.cached-dfsused.check.interval.ms like Stephen said. Otherwise, HDFS-14313 may be useful too. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039018#comment-17039018 ] Stephen O'Donnell commented on HDFS-15171: -- [~zhuqi] There are a few parts to this. First, the disk usage is refreshed every 10 minutes by a thread in CachingGetSpaceUsed, however I did not realise it did not persist the newly calculated value to the cache file. You are correct, as it does not, and the shutdown hook does it. It also could take some time to complete if the disk is large. I wonder if we could use the existing refresh thread in CachingGetSpaceUsed, perhaps by passing a callback (or something like the existing shutdown hook) so it can save the cache file each time it runs? The next problem is that in BlockPoolSlice.loadDfsUsed(), it will only load the cache file if the mtime on the file is less than "dfs.datanode.cached-dfsused.check.interval.ms" old. This defaults to 10 minutes. This results in another problem: If you shutdown a DN for more than 10 minutes, even if it shutdown cleanly and saved the cache file, it will not read the cache file on startup as the file is over 10 minutes old. Saying as the disk usage will be refreshed within 10 minutes of the DN starting, I think 10 minutes for dfs.datanode.cached-dfsused.check.interval.ms is probably too small and that default could do with being a bit higher. If we could ensure the cache file was saved approximately every 10 minutes by the refresh thread, you could argue that the DN should always use the cache file if it is there, as it should be reasonably up to date anyway. I encountered a problem like this recently with a few DNs which were taking a very long time to startup, and we worked around it by adjusting the mtime on the cache file to get them started. This was OK for a one off, but perhaps we can do better in this Jira. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038090#comment-17038090 ] zhuqi commented on HDFS-15171: -- cc [~linyiqun], [~weichiu] , [~hexiaoqiao] What do you think about this problem? If you can give me some advice. Thanks. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org