[jira] [Commented] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964754#comment-16964754 ] Rui Mo commented on HDFS-14740: --- [^HDFS_Persistent_Read-Cache_Test-v1.1.pdf] has been uploaded. Any comment is welcome! > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS_Persistent_Read-Cache_Test-v1.1.pdf > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958560#comment-16958560 ] Rui Mo commented on HDFS-14740: --- In [^HDFS-14740.006.patch]: Added unit test for uncaching the restored files from PM. > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.006.patch > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.005.patch > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929046#comment-16929046 ] Rui Mo commented on HDFS-14740: --- Thanks [~rakeshr] for reviewing the patch and the valuable comments. In [^HDFS-14740.004.patch] : {quote}1. Please remove duplicate checks in #restoreCache() method as you already doing the checks inside #createBlockPoolDir(). {quote} The duplicate checks has been removed. {quote}2. {{pmemVolume/BlockPoolId/BlockPoolId-BlockId}}. {{BlockPoolId}} is duplicated. {quote} The file is named as BlockId for simplicity. {quote}3. Can you explore the chances of using hierarchical way of storing blocks similar to the existing datanode data.dir, this is to avoid chances of growing blocks under one single blockPoolId. Assume cache capacity in TBs and large set of data blocks in cache under a blockPool. Please refer {{DatanodeUtil.idToBlockDir(finalizedDir, b.getBlockId());}} {quote} We {{use}} hierarchical way of cache storage referring to the implementation in DatanodeUtil, so as to avoid storing large amount of blocks under one single BlockPoolId. {quote}{{4.restoreCache()}} - How about moving specific parsing/restore logic to respective MappableBlockLoaders. PmemMappableBlockLoader#restoreCache() and NativePmemMappableBlockLoader#restoreCache(). {quote} We have refactored this part of implementation. restoreCache() remains in PmemVolumeManger to restore some variables, but it calls specific parsing/{color:#172b4d}restore logic in respective MappableBlockLoaders. {color} {quote}{color:#172b4d}5. {{dfs.datanode.cache.persistence.enabled}} - by default this can be true as this will allow to get maximum capabilities of pmem device. Overall the feature is disabled and default value of "dfs.datanode.cache.pmem.dirs" is empty and will be DRAM based. So, once the user enables pmem, they can utilize the potential of this device and no case of compatibility.{color} {quote} {color:#172b4d}{{dfs.datanode.cache.persistence.enabled}}{color} is true by default now. The user can enable pmem by configuring{color:#172b4d}"dfs.datanode.cache.pmem.dirs".{color} {color:#172b4d}Thanks!{color} > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the cache > status when DataNode restarts, thus, cache warm up time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.004.patch > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the cache > status when DataNode restarts, thus, cache warm up time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.003.patch > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the cache > status when DataNode restarts, thus, cache warm up time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917589#comment-16917589 ] Rui Mo commented on HDFS-14740: --- Fixed one checkstyle issue and refined unit test in [^HDFS-14740.002.patch]. > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.002.patch > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912114#comment-16912114 ] Rui Mo edited comment on HDFS-14740 at 8/21/19 9:37 AM: Fixed FindBugs and checkstyle issues in [^HDFS-14740.001.patch]. was (Author: rui mo): Fixed FindBugs and checkstyle issues in HDFS-14740.001.patch. > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912114#comment-16912114 ] Rui Mo commented on HDFS-14740: --- Fixed FindBugs and checkstyle issues in HDFS-14740.001.patch. > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.001.patch > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: HDFS-14740.001.diff > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Mo updated HDFS-14740: -- Attachment: (was: HDFS-14740.001.diff) > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org