[jira] [Updated] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14818: Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~PhiloHe] for the contribution. +1 LGTM. Committed to trunk > Check native pmdk lib by 'hadoop checknative' command > - > > Key: HDFS-14818 > URL: https://issues.apache.org/jira/browse/HDFS-14818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: native >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14818.000.patch, HDFS-14818.001.patch, > HDFS-14818.002.patch, HDFS-14818.003.patch, HDFS-14818.004.patch, > check_native_after_building_with_PMDK.png, > check_native_after_building_with_PMDK_using_NAME_instead_of_REALPATH.png, > check_native_after_building_without_PMDK.png > > > Currently, 'hadoop checknative' command supports checking native libs, such > as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in > the checking. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives
[ https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13762: Target Version/s: 3.3.0 Status: Open (was: Patch Available) > Support non-volatile storage class memory(SCM) in HDFS cache directives > --- > > Key: HDFS-13762 > URL: https://issues.apache.org/jira/browse/HDFS-13762 > Project: Hadoop HDFS > Issue Type: New Feature > Components: caching, datanode >Reporter: Sammi Chen >Assignee: Feilong He >Priority: Major > Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, > HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, > HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, > HDFS-13762.008.patch, HDFS_Persistent_Memory_Cache_Perf_Results.pdf, > SCMCacheDesign-2018-11-08.pdf, SCMCacheDesign-2019-07-12.pdf, > SCMCacheDesign-2019-07-16.pdf, SCMCacheDesign-2019-3-26.pdf, > SCMCacheTestPlan-2019-3-27.pdf, SCMCacheTestPlan.pdf > > > No-volatile storage class memory is a type of memory that can keep the data > content after power failure or between the power cycle. Non-volatile storage > class memory device usually has near access speed as memory DIMM while has > lower cost than memory. So today It is usually used as a supplement to > memory to hold long tern persistent data, such as data in cache. > Currently in HDFS, we have OS page cache backed read only cache and RAMDISK > based lazy write cache. Non-volatile memory suits for both these functions. > This Jira aims to enable storage class memory first in read cache. Although > storage class memory has non-volatile characteristics, to keep the same > behavior as current read only cache, we don't use its persistent > characteristics currently. > > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928209#comment-16928209 ] Rakesh R edited comment on HDFS-14818 at 9/12/19 5:22 AM: -- Thanks [~PhiloHe] for the patch. Overall looks good to me, just few comments. # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it. {code:java} SupportState.PMDK_LIB_NOT_FOUND(1), {code} {code:java} case 1: msg = "The native code is built with PMDK support, but PMDK libs " + "are NOT found in execution environment or failed to be loaded."; break; {code} # Any reason to change 'NAME' to 'REALPATH'. was (Author: rakeshr): Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment. # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it. {code} SupportState.PMDK_LIB_NOT_FOUND(1), {code} {code} case 1: msg = "The native code is built with PMDK support, but PMDK libs " + "are NOT found in execution environment or failed to be loaded."; break; {code} > Check native pmdk lib by 'hadoop checknative' command > - > > Key: HDFS-14818 > URL: https://issues.apache.org/jira/browse/HDFS-14818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: native >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14818.000.patch > > > Currently, 'hadoop checknative' command supports checking native libs, such > as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in > the checking. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928209#comment-16928209 ] Rakesh R commented on HDFS-14818: - Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment. # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it. {code} SupportState.PMDK_LIB_NOT_FOUND(1), {code} {code} case 1: msg = "The native code is built with PMDK support, but PMDK libs " + "are NOT found in execution environment or failed to be loaded."; break; {code} > Check native pmdk lib by 'hadoop checknative' command > - > > Key: HDFS-14818 > URL: https://issues.apache.org/jira/browse/HDFS-14818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: native >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14818.000.patch > > > Currently, 'hadoop checknative' command supports checking native libs, such > as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in > the checking. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918242#comment-16918242 ] Rakesh R commented on HDFS-14740: - Thanks [~Rui Mo] for the contribution. Overall the idea looks good. Added few comments, please take care. # Please remove duplicate checks in #restoreCache() method as you already doing the checks inside #createBlockPoolDir(). {code} #createBlockPoolDir() if (!cacheDir.exists() && !cacheDir.mkdir()) { {code} {code} #restoreCache() if (cacheDir.exists()) { {code} # {{pmemVolume/BlockPoolId/BlockPoolId-BlockId}}. {{BlockPoolId}} is duplicated and please remove this from the file name. This will avoid {{cachedFile.getName().split("-");}} splitting logic and make it simple. # Can you explore the chances of using hierarchical way of storing blocks similar to the existing datanode data.dir, this is to avoid chances of growing blocks under one single blockPoolId. Assume cache capacity in TBs and large set of data blocks in cache under a blockPool. Please refer {{DatanodeUtil.idToBlockDir(finalizedDir, b.getBlockId());}} # {{restoreCache()}} - How about moving specific parsing/restore logic to respective MappableBlockLoaders. PmemMappableBlockLoader#restoreCache() and NativePmemMappableBlockLoader#restoreCache(). # {{dfs.datanode.cache.persistence.enabled}} - by default this can be true as this will allow to get maximum capabilities of pmem device. Overall the feature is disabled and default value of "dfs.datanode.cache.pmem.dirs" is empty and will be DRAM based. So, once the user enables pmem, they can utilize the potential of this device and no case of compatibility. > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1
[ https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914096#comment-16914096 ] Rakesh R commented on HDFS-14745: - [~PhiloHe] thanks for taking this ahead. Could you rename the patch including the branch name so that QA will be happy to apply it and run it. For example, "{{HDFS-14745-branch-3.1.001.patch}}". This is a good feature to be included into {{branch-3.x}}, including 3.0, 3.1, 3.2 branches and another good thing is less code conflicts. Will take it up one by one. > Backport HDFS persistent memory read cache support to branch-3.1 > > > Key: HDFS-14745 > URL: https://issues.apache.org/jira/browse/HDFS-14745 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: cache, datanode > Fix For: 3.3.0 > > Attachments: HDFS-14745.000.patch, HDFS-14745.001.patch, > HDFS-14745.002.patch > > > We are proposing to backport the patches for HDFS-13762, HDFS persistent > memory read cache support, to branch-3.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14740) HDFS read cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reassigned HDFS-14740: --- Assignee: Rui Mo (was: Feilong He) > HDFS read cache persistence support > --- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Feilong He >Assignee: Rui Mo >Priority: Major > Attachments: HDFS-14740.000.patch > > > In HDFS-13762, persistent memory is enabled in HDFS centralized cache > management. Even though persistent memory can persist cache data, for > simplifying the implementation, the previous cache data will be cleaned up > during DataNode restarts. We propose to improve HDFS persistent memory (PM) > cache by taking advantage of PM's data persistence characteristic, i.e., > recovering the cache status when DataNode restarts, thus, cache warm up time > can be saved for user. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity
[ https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14700: Issue Type: Sub-task (was: Bug) Parent: HDFS-13762 > Clean up pmem cache before setting pmem cache capacity > -- > > Key: HDFS-14700 > URL: https://issues.apache.org/jira/browse/HDFS-14700 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch > > > Cleaning up pmem cache left before, if any, should be prior to setting pmem > cache capacity. Because usable space size is used to set pmem cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity
[ https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14700: Resolution: Fixed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) > Clean up pmem cache before setting pmem cache capacity > -- > > Key: HDFS-14700 > URL: https://issues.apache.org/jira/browse/HDFS-14700 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch > > > Cleaning up pmem cache left before, if any, should be prior to setting pmem > cache capacity. Because usable space size is used to set pmem cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity
[ https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903708#comment-16903708 ] Rakesh R commented on HDFS-14700: - Committed the changes to trunk. > Clean up pmem cache before setting pmem cache capacity > -- > > Key: HDFS-14700 > URL: https://issues.apache.org/jira/browse/HDFS-14700 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch > > > Cleaning up pmem cache left before, if any, should be prior to setting pmem > cache capacity. Because usable space size is used to set pmem cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity
[ https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903693#comment-16903693 ] Rakesh R commented on HDFS-14700: - Thanks [~PhiloHe] for reporting this issue. +1 patch looks good to me. I will commit shortly. > Clean up pmem cache before setting pmem cache capacity > -- > > Key: HDFS-14700 > URL: https://issues.apache.org/jira/browse/HDFS-14700 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch > > > Cleaning up pmem cache left before, if any, should be prior to setting pmem > cache capacity. Because usable space size is used to set pmem cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity
[ https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14700: Summary: Clean up pmem cache before setting pmem cache capacity (was: Clean up pmem cache left before prior to setting pmem cache capacity) > Clean up pmem cache before setting pmem cache capacity > -- > > Key: HDFS-14700 > URL: https://issues.apache.org/jira/browse/HDFS-14700 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch > > > Cleaning up pmem cache left before, if any, should be prior to setting pmem > cache capacity. Because usable space size is used to set pmem cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives
[ https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13762: Release Note: Non-volatile storage class memory (SCM, also known as persistent memory) is supported in HDFS cache. To enable SCM cache, user just needs to configure SCM volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all HDFS cache directives keep unchanged. There are two implementations for HDFS SCM Cache, one is pure java code implementation and the other is native PMDK based implementation. The latter implementation can bring user better performance gain in cache write and cache read. To enable PMDK based implementation, user should install PMDK library by referring to the official site http://pmem.io/. Then, build Hadoop with PMDK support by referring to "PMDK library build options" section in `BUILDING.txt` in the source code. If multiple SCM volumes are configured, a round-robin policy is used to select an available volume for caching a block. Consistent with DRAM cache, SCM cache also has no cache eviction mechanism. When DataNode receives a data read request from a client, if the corresponding block is cached into SCM, DataNode will instantiate an InputStream with the block location path on SCM (pure java implementation) or cache address on SCM (PMDK based implementation). Once the InputStream is created, DataNode will send the cached data to the client. Please refer "Centralized Cache Management" guide for more details. (was: Non-volatile storage class memory (SCM, also known as persistent memory) is supported in HDFS cache. To enable SCM cache, user just needs to configure SCM volume for property “dfs.datanode.cache.pmem.dirs”. And all HDFS cache directives keep unchanged. There are two implementations for HDFS SCM Cache, one is pure java code implementation and the other is native PMDK based implementation. The latter implementation can bring user better performance gain in cache write and cache read. To enable PMDK based implementation, user should install PMDK library by referring to the official site http://pmem.io/. Then, build Hadoop with PMDK support by referring to "PMDK library build options" section in `BUILDING.txt` in the source code. If multiple SCM volumes are configured, a round-robin policy is used to select an available volume for caching a block. Consistent with DRAM cache, SCM cache also has no cache eviction mechanism. When DataNode receives a data read request from a client, if the corresponding block is cached into SCM, DataNode will instantiate an InputStream with the block location path on SCM (pure java implementation) or cache address on SCM (PMDK based implementation). Once the InputStream is created, DataNode will send the cache data to the client.) > Support non-volatile storage class memory(SCM) in HDFS cache directives > --- > > Key: HDFS-13762 > URL: https://issues.apache.org/jira/browse/HDFS-13762 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Sammi Chen >Assignee: Feilong He >Priority: Major > Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, > HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, > HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, > HDFS-13762.008.patch, SCMCacheDesign-2018-11-08.pdf, > SCMCacheDesign-2019-07-12.pdf, SCMCacheDesign-2019-07-16.pdf, > SCMCacheDesign-2019-3-26.pdf, SCMCacheTestPlan-2019-3-27.pdf, > SCMCacheTestPlan.pdf, SCM_Cache_Perf_Results-v1.pdf > > > No-volatile storage class memory is a type of memory that can keep the data > content after power failure or between the power cycle. Non-volatile storage > class memory device usually has near access speed as memory DIMM while has > lower cost than memory. So today It is usually used as a supplement to > memory to hold long tern persistent data, such as data in cache. > Currently in HDFS, we have OS page cache backed read only cache and RAMDISK > based lazy write cache. Non-volatile memory suits for both these functions. > This Jira aims to enable storage class memory first in read cache. Although > storage class memory has non-volatile characteristics, to keep the same > behavior as current read only cache, we don't use its persistent > characteristics currently. > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14357) Update documentation for HDFS cache on SCM support
[ https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14357: Resolution: Fixed Status: Resolved (was: Patch Available) > Update documentation for HDFS cache on SCM support > -- > > Key: HDFS-14357 > URL: https://issues.apache.org/jira/browse/HDFS-14357 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, > HDFS-14357.002.patch, HDFS-14357.003.patch, HDFS-14357.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14357) Update documentation for HDFS cache on SCM support
[ https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14357: Summary: Update documentation for HDFS cache on SCM support (was: Update the relevant docs for HDFS cache on SCM support) > Update documentation for HDFS cache on SCM support > -- > > Key: HDFS-14357 > URL: https://issues.apache.org/jira/browse/HDFS-14357 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, > HDFS-14357.002.patch, HDFS-14357.003.patch, HDFS-14357.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14357) Update the relevant docs for HDFS cache on SCM support
[ https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884905#comment-16884905 ] Rakesh R commented on HDFS-14357: - Thanks [~PhiloHe] for the contribution. +1 patch looks good to me, I will commit this shortly. > Update the relevant docs for HDFS cache on SCM support > -- > > Key: HDFS-14357 > URL: https://issues.apache.org/jira/browse/HDFS-14357 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, > HDFS-14357.002.patch, HDFS-14357.003.patch, HDFS-14357.004.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14458) Report pmem stats to namenode
[ https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884897#comment-16884897 ] Rakesh R commented on HDFS-14458: - Attached the patch by removing the {{WARN}} log message and the same is committed to trunk branch. > Report pmem stats to namenode > - > > Key: HDFS-14458 > URL: https://issues.apache.org/jira/browse/HDFS-14458 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14458-committedpatch.patch, HDFS-14458.000.patch, > HDFS-14458.001.patch, HDFS-14458.002.patch, HDFS-14458.003.patch, > HDFS-14458.004.patch, HDFS-14458.005.patch > > > Currently, two important stats should be reported to NameNode: cache used and > cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14458) Report pmem stats to namenode
[ https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14458: Attachment: HDFS-14458-committedpatch.patch > Report pmem stats to namenode > - > > Key: HDFS-14458 > URL: https://issues.apache.org/jira/browse/HDFS-14458 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14458-committedpatch.patch, HDFS-14458.000.patch, > HDFS-14458.001.patch, HDFS-14458.002.patch, HDFS-14458.003.patch, > HDFS-14458.004.patch, HDFS-14458.005.patch > > > Currently, two important stats should be reported to NameNode: cache used and > cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14458) Report pmem stats to namenode
[ https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14458: Resolution: Fixed Status: Resolved (was: Patch Available) > Report pmem stats to namenode > - > > Key: HDFS-14458 > URL: https://issues.apache.org/jira/browse/HDFS-14458 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, > HDFS-14458.002.patch, HDFS-14458.003.patch, HDFS-14458.004.patch, > HDFS-14458.005.patch > > > Currently, two important stats should be reported to NameNode: cache used and > cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14458) Report pmem stats to namenode
[ https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884893#comment-16884893 ] Rakesh R commented on HDFS-14458: - Thanks [~PhiloHe] for the patch. I am thinking to not include the WARN message, it may pollute with lots of log messages and am removing it now. Anyway, there is a log message to clearly conveying the disable part. Apart from that your patch looks good to me, I will commit shortly. > Report pmem stats to namenode > - > > Key: HDFS-14458 > URL: https://issues.apache.org/jira/browse/HDFS-14458 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, > HDFS-14458.002.patch, HDFS-14458.003.patch, HDFS-14458.004.patch, > HDFS-14458.005.patch > > > Currently, two important stats should be reported to NameNode: cache used and > cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14357) Update the relevant docs for HDFS cache on SCM support
[ https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883688#comment-16883688 ] Rakesh R commented on HDFS-14357: - Thanks [~PhiloHe] for the patch. Please take care below comment: bq. One depends on PMDK libs and the other doesn't. PMDK can bring user performance gain for cache write and cache read Please rephrase it like, The default is based on pure Java implementation and the other is native implementation which leverages PMDK library to improve the performance of cache write and cache read. To enable and use PMDK, use following steps: 1. Build PMDK library. Please refer to the official site "" for detail information. 2. Build Hadoop with PMDK support. Please refer to `BUILDING.txt` to build hadoop with PMDK support. 3. Copy PMDK library. Make sure PMDK is available on HDFS DataNodes. To verify that PMDK is correctly detected by Hadoop, run the hadoop {{checknative}} command. bq. For multiply volumes Typo: {{For multiply volumes}} --> to --> {{For multiple volumes}} > Update the relevant docs for HDFS cache on SCM support > -- > > Key: HDFS-14357 > URL: https://issues.apache.org/jira/browse/HDFS-14357 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, > HDFS-14357.002.patch, HDFS-14357.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14458) Report pmem stats to namenode
[ https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883663#comment-16883663 ] Rakesh R commented on HDFS-14458: - Thanks [~PhiloHe] for the updates. Please take care below comments: # Remove unused method {code} /** * Check if pmem cache is enabled. */ private boolean isPmemCacheEnabled() { return !cacheLoader.isTransientCache(); } {code} # Do we really need this exception handling? This can log repeatedly. {code} if (cacheCapacity == 0L) { throw new IOException("DRAM cache may be disabled. The cache capacity is 0."); } {code} # We need to unify {{CacheStats}} of 'DRAM' and 'PMem' utilization. I'm OK to do this separately along with LazyWriter PMem support. > Report pmem stats to namenode > - > > Key: HDFS-14458 > URL: https://issues.apache.org/jira/browse/HDFS-14458 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, > HDFS-14458.002.patch, HDFS-14458.003.patch, HDFS-14458.004.patch > > > Currently, two important stats should be reported to NameNode: cache used and > cache capacity. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14458) Report pmem stats to namenode
[ https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882301#comment-16882301 ] Rakesh R commented on HDFS-14458: - Thanks [~PhiloHe] for taking this ahead. Added few comments: # By default {{dfs.datanode.max.locked.memory}} is zero. Do you wants to disable in-memory caching if PMem-cache is enabled? If yes, please add a log message to convey the same. Could you try adding a unit test to automate this behavior. {code} this.memCacheStats = new MemoryCacheStats(0L); {code} # I'd prefer to avoid {{if (isPmemCacheEnabled())}} checks inside FsDatasetCache. How about {{cacheLoader#initialize(this)}} returns {{memStats}} ? {code} MemoryCacheStats stats = cacheLoader.initialize(this); {code} # Appreciate if you could add unit test the results of PMem stats. Thanks! > Report pmem stats to namenode > - > > Key: HDFS-14458 > URL: https://issues.apache.org/jira/browse/HDFS-14458 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, > HDFS-14458.002.patch > > > Currently, two important stats should be reported to NameNode: cache used and > cache capacity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1688) Deadlock in ratis client
Rakesh R created HDDS-1688: -- Summary: Deadlock in ratis client Key: HDDS-1688 URL: https://issues.apache.org/jira/browse/HDDS-1688 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Rakesh R Attachments: Freon_baseline_100Threads_64MB_Keysize_8Keys_10buckets.bin Ran Freon benchmark in a three node cluster with 100 writer threads. After some time the client got hanged due to deadlock issue. +Freon with the args:-+ --numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 --numOfThreads=100 3 BLOCKED threads. Attached whole threaddump. {code} Found one Java-level deadlock: = "grpc-default-executor-6": waiting for ownable synchronizer 0x00021546bd00, (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync), which is held by "ForkJoinPool.commonPool-worker-7" "ForkJoinPool.commonPool-worker-7": waiting to lock monitor 0x7f48fc99c448 (object 0x00021546be30, a org.apache.ratis.util.SlidingWindow$Client), which is held by "grpc-default-executor-6" {code} {code} ForkJoinPool.commonPool-worker-7 priority:5 - threadId:0x7f48d834b000 - nativeId:0x9ffb - nativeId (decimal):40955 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348) - waiting to lock <0x00021546be30> (a org.apache.ratis.util.SlidingWindow$Client) at org.apache.ratis.client.impl.OrderedAsync.resetSlidingWindow(OrderedAsync.java:122) at org.apache.ratis.client.impl.OrderedAsync$$Lambda$943/1670264164.accept(Unknown Source) at org.apache.ratis.client.impl.RaftClientImpl.lambda$handleIOException$6(RaftClientImpl.java:352) at org.apache.ratis.client.impl.RaftClientImpl$$Lambda$944/769363367.accept(Unknown Source) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:352) at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:235) at org.apache.ratis.client.impl.OrderedAsync$$Lambda$776/1213731951.apply(Unknown Source) at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:324) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.close(GrpcClientProtocolClient.java:313) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$400(GrpcClientProtocolClient.java:245) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.lambda$close$1(GrpcClientProtocolClient.java:131) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$$Lambda$950/1948156329.accept(Unknown Source) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:131) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:73) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy$$Lambda$948/427065222.run(Unknown Source) at org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231) at org.apache.ratis.util.LifeCycle$$Lambda$949/1311526821.get(Unknown Source) at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251) at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229) at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70) - locked <0x0003e793ef48> (a org.apache.ratis.util.PeerProxyMap$PeerAndProxy) at org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:126) - locked <0x000215453400> (a java.lang.Object) at org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:135) at org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47) at org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:375) at org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:341) at org.apache.ratis.client.impl.UnorderedAsync.lambda$sendRequestWithRetry$4(UnorderedAsync.java:108) at org.apache.ratis.client.impl.UnorderedAsync$$Lambda$976/655038759.accept(Unknown Source) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at
[jira] [Created] (HDDS-1687) Datanode process shutdown due to OOME
Rakesh R created HDDS-1687: -- Summary: Datanode process shutdown due to OOME Key: HDDS-1687 URL: https://issues.apache.org/jira/browse/HDDS-1687 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Rakesh R Attachments: baseline test - datanode error logs.0.5.0.rar Ran Freon benchmark in a three node cluster and with more parallel writer threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type in worker nodes. +Freon with the args:-+ --numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 --numOfThreads=100 *DN-2* : Process got killed during the test, due to OOME {code} 2019-06-13 00:48:11,976 ERROR org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating with exit status 1: a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory /data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed. java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:44) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:70) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481) at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234) at java.lang.Thread.run(Thread.java:748) {code} *DN3* : Process got killed during the test, due to OOME. I could see lots of NPE at the datanode logs. {code} 2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9: follower 07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, log's start index is 10062, need to notify follower to install snapshot 2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: follower responses installSnapshot Completed 2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 83232f1f-4469-4a4d-b369-c131c8432ae9: follower 07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, log's start index is 10062, need to notify follower to install snapshot 2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender: org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected exception java.lang.NullPointerException: 83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: Previous TermIndex not found for firstIndex = 10062 at java.util.Objects.requireNonNull(Objects.java:290) at org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234) at org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221) at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169) at org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80) at java.lang.Thread.run(Thread.java:748) OOME log messages present in the *.out file. Exception in thread "org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68) at org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153) at java.util.ArrayList.forEach(ArrayList.java:1257) at org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372) at org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394) at org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1594) NullPointerException at the ratis client while running Freon benchmark
[ https://issues.apache.org/jira/browse/HDDS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDDS-1594: --- Attachment: NPE-logs.tar.gz > NullPointerException at the ratis client while running Freon benchmark > -- > > Key: HDDS-1594 > URL: https://issues.apache.org/jira/browse/HDDS-1594 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Rakesh R >Priority: Minor > Attachments: NPE-logs.tar.gz > > > Hits NPE during Freon benchmark test run. Below is the exception logged at > the client side output log message. > {code} > SEVERE: Exception while executing runnable > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536 > java.lang.NullPointerException > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245) > at > org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269) > at > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) > at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > at > org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) > at > org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) > at > org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) > at > org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) > at > org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1594) NullPointerException at the ratis client while running Freon benchmark
Rakesh R created HDDS-1594: -- Summary: NullPointerException at the ratis client while running Freon benchmark Key: HDDS-1594 URL: https://issues.apache.org/jira/browse/HDDS-1594 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.4.0 Reporter: Rakesh R Hits NPE during Freon benchmark test run. Below is the exception logged at the client side output log message. {code} SEVERE: Exception while executing runnable org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536 java.lang.NullPointerException at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245) at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269) at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) at org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14402: Labels: SCM (was: ) > Use FileChannel.transferTo() method for transferring block to SCM cache > --- > > Key: HDFS-14402 > URL: https://issues.apache.org/jira/browse/HDFS-14402 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: SCM > Fix For: 3.3.0 > > Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, > HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, > Without-Cache-Improvement-Patch.png > > > We will consider to use transferTo API to improve SCM's cach performace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14402: Resolution: Fixed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) I have committed latest patch to trunk. Thanks [~PhiloHe] for the contribution! > Use FileChannel.transferTo() method for transferring block to SCM cache > --- > > Key: HDFS-14402 > URL: https://issues.apache.org/jira/browse/HDFS-14402 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, > HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, > Without-Cache-Improvement-Patch.png > > > We will consider to use transferTo API to improve SCM's cach performace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848372#comment-16848372 ] Rakesh R commented on HDFS-14402: - Thank you [~PhiloHe] for the patch. The performance result looks promising and shows time reduction. Probably, can plan test comparison between HDD and NVMe devices, that would be interesting and can attach to umbrella Jira as well. +1 LGTM, I will commit the latest patch shortly. > Use FileChannel.transferTo() method for transferring block to SCM cache > --- > > Key: HDFS-14402 > URL: https://issues.apache.org/jira/browse/HDFS-14402 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, > HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, > Without-Cache-Improvement-Patch.png > > > We will consider to use transferTo API to improve SCM's cach performace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847482#comment-16847482 ] Rakesh R commented on HDFS-14402: - bq. Upon making checksum configurable, I am thinking maybe to user, the cache read performance is concerned mostly. The checksum is executed once when caching data to DRAM/Pmem. It may be tolerable to user with checksum operation for data verification. I think more discussions are required. I will open a separate Jira common to DRAM and Pmem cache. Yes, it makes sense to me to move the discussion to a separate jira. One use case I can foresee that, if some user has concern about the sanity of data while {{read-from-cache}} block then they will do checksum computation again at read call. This is again depends on the NVMe device consistency level, I think. In that case the checksum we did at the beginning can be skipped and not worried about the sanity during {{write-to-cache}}. > Use FileChannel.transferTo() method for transferring block to SCM cache > --- > > Key: HDFS-14402 > URL: https://issues.apache.org/jira/browse/HDFS-14402 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, > HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, > Without-Cache-Improvement-Patch.png > > > We will consider to use transferTo API to improve SCM's cach performace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14402: Summary: Use FileChannel.transferTo() method for transferring block to SCM cache (was: Improve the implementation for HDFS cache on SCM) > Use FileChannel.transferTo() method for transferring block to SCM cache > --- > > Key: HDFS-14402 > URL: https://issues.apache.org/jira/browse/HDFS-14402 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14402.000.patch, HFDS-14402.001.patch > > > We will consider to use transferTo API to improve SCM's cach performace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14402) Improve the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839033#comment-16839033 ] Rakesh R commented on HDFS-14402: - Thanks [~PhiloHe] for taking this jira ahead. Please go through below review comments on the patch: # Checksum verification uses {{blockChannel}} and reads content at the replica device throughput. Again, reading block content make the replica device busy and will affect other client read/write operation, assume replica device is a HDD. Since the data is written to NVMe, you can read it back from NVMe. Also, this will act as a verification checkpoint to ensure there is no data loss after writing the block to NVMe device, right? {code} verifyChecksum(length, metaIn, blockChannel, blockFileName); {code} # Can we make checksum compute on/off based on some configuration and this can improve performance and not make the device busy. We can introduce this tuning on/off parameter applicable to all MappableBlockLoaders. # Please change {{protected int fillBuffer}} to {{private}} visibility. > Improve the implementation for HDFS cache on SCM > > > Key: HDFS-14402 > URL: https://issues.apache.org/jira/browse/HDFS-14402 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14402.000.patch, HFDS-14402.001.patch > > > We will consider to use transferTo API to improve SCM's cach performace. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14401: Resolution: Fixed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) I have committed the patch to trunk. Thanks [~umamaheswararao], [~anoop.hbase] for the reviews! Thanks [~PhiloHe] for the contribution! > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, > HDFS-14401.008.patch, HDFS-14401.009.patch, HDFS-14401.010.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835285#comment-16835285 ] Rakesh R commented on HDFS-14401: - Thanks [~PhiloHe] for the continuous efforts on this. +1, latest patch looks good to me. If there are no comments from others, I will move ahead and commit this today. Thanks! > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, > HDFS-14401.008.patch, HDFS-14401.009.patch, HDFS-14401.010.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834579#comment-16834579 ] Rakesh R edited comment on HDFS-14401 at 5/7/19 9:44 AM: - Apart from the following comments, latest patch looks good to me. I will commit the patch after fixing these comments, if there is no more comments from others. # How about adding a function to the interface and makes {{FsDatasetCache}} code simple. {code:java} MappableBlockLoader.java /** * Cleaning up the cache, can be used during shutdown. */ void cleanup() { // do nothing } PmemMappableBlockLoader.java @Override void cleanup() { LOG.info("Clean up cache on persistent memory during shutdown."); pmemVolumeManager.cleanup(); } /** * Clean up cache. */ void shutdown() { cacheLoader.cleanup(); } {code} # Why can't we simply do {{return new File(rawPmemDir, CACHE_DIR).getAbsolutePath();}} instead of {{rawPmemDir.endsWith("/") ? rawPmemDir + CACHE_DIR}} was (Author: rakeshr): Apart from the following comments, latest patch looks good to me. I will commit the patch after fixing these comments, if there is no more comments from others. # How about adding a function to the interface and makes {{FsDatasetCache}} code simple. {code:java} MappableBlockLoader.java /** * Cleaning up the cache, can be used during shutdown. */ void cleanup() { // do nothing } PmemMappableBlockLoader.java @Override void cleanup() { LOG.info("Clean up cache on persistent memory during shutdown."); pmemVolumeManager.cleanup(); } /** * Clean up cache. */ void shutdown() { cacheLoader.cleanup(); } {code} # Why can't we simply do {{return new File(rawPmemDir, CACHE_DIR).getAbsolutePath();}} instead of {{rawPmemDir.endsWith("/") ? rawPmemDir + CACHE_DIR}} > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, > HDFS-14401.008.patch, HDFS-14401.009.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834579#comment-16834579 ] Rakesh R commented on HDFS-14401: - Apart from the following comments, latest patch looks good to me. I will commit the patch after fixing these comments, if there is no more comments from others. # How about adding a function to the interface and makes {{FsDatasetCache}} code simple. {code:java} MappableBlockLoader.java /** * Cleaning up the cache, can be used during shutdown. */ void cleanup() { // do nothing } PmemMappableBlockLoader.java @Override void cleanup() { LOG.info("Clean up cache on persistent memory during shutdown."); pmemVolumeManager.cleanup(); } /** * Clean up cache. */ void shutdown() { cacheLoader.cleanup(); } {code} # Why can't we simply do {{return new File(rawPmemDir, CACHE_DIR).getAbsolutePath();}} instead of {{rawPmemDir.endsWith("/") ? rawPmemDir + CACHE_DIR}} > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, > HDFS-14401.008.patch, HDFS-14401.009.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830376#comment-16830376 ] Rakesh R edited comment on HDFS-14401 at 4/30/19 3:09 PM: -- Overall the patch looks good and I think its nearing completion. Could you please take care below comments: # Rename PmemVolumeManager variable '{{i}}' to '{{nextIndex}}'. # How about resetting {{nextIndex}} to avoid growing to infinity, probably can refer below idea or you can explicitly reset {{nextIndex=0}}, {{if (nextIndex == count)}}. {code:java} private byte nextIndex = 0; .. .. while (k++ != count) { nextIndex = (byte) (nextIndex % count); byte index = nextIndex; nextIndex++; long availableBytes = usedBytesCounts.get(index).getAvailableBytes(); if (availableBytes >= bytesCount) { return index; } if (availableBytes > maxAvailableSpace) { maxAvailableSpace = availableBytes; } } {code} # Instead of {{memCacheStats.getCacheUsed()}}, it should be {{cacheLoader.getCacheUsed()}}, right? {code:java} LOG.debug("Caching of {} was aborted. We are now caching only {} " + "bytes in total.", key, cacheLoader.getCacheUsed()); {code} # Please double check the chances of any scenario where it adds {{blockKeyToVolume.put(key, index);}} entry and then {{usedBytesCounts.get(index).reserve(bytesCount);}} return -1? was (Author: rakeshr): Overall the patch looks good and I think its nearing completion. Could you please take care below comments: # Rename PmemVolumeManager variable '{{i}}' to '{{nextIndex}}'. # How about resetting {{nextIndex}} to avoid growing to infinity, probably can refer below idea or you can explicitly reset {{nextIndex=0}}, {{if (nextIndex == count)}}. {code:java} private byte nextIndex = 0; .. .. while (k++ != count) { nextIndex = (byte) (nextIndex % count); byte index = nextIndex; nextIndex++; long availableBytes = usedBytesCounts.get(index).getAvailableBytes(); if (availableBytes >= bytesCount) { return index; } if (availableBytes > maxAvailableSpace) { maxAvailableSpace = availableBytes; } } {code} # Instead of {{memCacheStats.getCacheUsed()}}, it should be {{cacheLoader.getCacheUsed()}}, right? {code:java} LOG.debug("Caching of {} was aborted. We are now caching only {} " + "bytes in total.", key, cacheLoader.getCacheUsed()); {code} # Please double check the chances of any scenario where it adds {{blockKeyToVolume.put(key, index);}} entry and then {{usedBytesCounts.get(index).reserve(bytesCount);}} return -1? > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830376#comment-16830376 ] Rakesh R commented on HDFS-14401: - Overall the patch looks good and I think its nearing completion. Could you please take care below comments: # Rename PmemVolumeManager variable '{{i}}' to '{{nextIndex}}'. # How about resetting {{nextIndex}} to avoid growing to infinity, probably can refer below idea or you can explicitly reset {{nextIndex=0}}, {{if (nextIndex == count)}}. {code:java} private byte nextIndex = 0; .. .. while (k++ != count) { nextIndex = (byte) (nextIndex % count); byte index = nextIndex; nextIndex++; long availableBytes = usedBytesCounts.get(index).getAvailableBytes(); if (availableBytes >= bytesCount) { return index; } if (availableBytes > maxAvailableSpace) { maxAvailableSpace = availableBytes; } } {code} # Instead of {{memCacheStats.getCacheUsed()}}, it should be {{cacheLoader.getCacheUsed()}}, right? {code:java} LOG.debug("Caching of {} was aborted. We are now caching only {} " + "bytes in total.", key, cacheLoader.getCacheUsed()); {code} # Please double check the chances of any scenario where it adds {{blockKeyToVolume.put(key, index);}} entry and then {{usedBytesCounts.get(index).reserve(bytesCount);}} return -1? > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch, HDFS-14401.006.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826642#comment-16826642 ] Rakesh R edited comment on HDFS-14401 at 4/26/19 5:12 AM: -- Thanks [~PhiloHe] for the good progress. Adding few comments, # Move log message to respective constructor, that will make the FsDatasetCache.java more cleaner. {code:java} PmemMappableBlockLoader(){ LOG.info("Initializing cache loader: PmemMappableBlockLoader"); } MemoryMappableBlockLoader(){ LOG.info("Initializing cache loader: MemoryMappableBlockLoader"); } {code} # How about using a {{MappableBlockLoaderFactory}} and move {{#createCacheLoader(DNConf)}} function into that. {code:java} MappableBlockLoader loader = MappableBlockLoaderFactory.getInstance().createCacheLoader(this.getDnConf()); {code} # Typo - '{{due to unsuccessfully mapping'}} -->to-> '{{due to unsuccessful mapping'}}. # Can we make synchronized functions {{long release}} and {{public String getCachePath}} # {{maxBytes = pmemDir.getTotalSpace();}}, IMHO, to use [File#getUsableSpace()|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getUsableSpace()] function. # Remove unused var in PmemVolumeManager.java - {{// private final UsedBytesCount usedBytesCount;}} # Its good to use {} instead of string concatenation in log messages. Please take care all such occurrences in newly writing code. {code:java} LOG.info("Added persistent memory - " + volumes[n] + " with size=" + maxBytes); to LOG.info("Added persistent memory - {} with size={}", volumes[n], maxBytes); {code} was (Author: rakeshr): Thanks [~PhiloHe] for the good progress. Adding few comments, # Move log message to respective constructor, that will make the FsDatasetCache.java more cleaner. {code:java} PmemMappableBlockLoader(){ LOG.info("Initializing cache loader: PmemMappableBlockLoader"); } MemoryMappableBlockLoader(){ LOG.info("Initializing cache loader: MemoryMappableBlockLoader"); } {code} # How about using a {{MappableBlockLoaderFactory}} and move {{#createCacheLoader(DNConf)}} function into that. {code:java} MappableBlockLoader loader = MappableBlockLoaderFactory.getInstance().createCacheLoader(this.getDnConf()); {code} # Typo - '{{due to unsuccessfully mapping'}} -->to-> '{{due to unsuccessful mapping'}}. # Can we make synchronized functions {{long release}} and {{public String getCachePath}} # {{maxBytes = pmemDir.getTotalSpace();}}, IMHO, to use [File#getUsableSpace()|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getUsableSpace()] function. # Remove unused var in PmemVolumeManager.java - {{// private final UsedBytesCount usedBytesCount;}} # Its good to use {} instead of string concatenation in log messages. Please take care all such occurrences in newly writing code. {code:java} LOG.info("Added persistent memory - " + volumes[n] + " with size=" + maxBytes); to LOG.info("Added persistent memory - {} with size={}", volumes[n], maxBytes); {code} > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826642#comment-16826642 ] Rakesh R commented on HDFS-14401: - Thanks [~PhiloHe] for the good progress. Adding few comments, # Move log message to respective constructor, that will make the FsDatasetCache.java more cleaner. {code:java} PmemMappableBlockLoader(){ LOG.info("Initializing cache loader: PmemMappableBlockLoader"); } MemoryMappableBlockLoader(){ LOG.info("Initializing cache loader: MemoryMappableBlockLoader"); } {code} # How about using a {{MappableBlockLoaderFactory}} and move {{#createCacheLoader(DNConf)}} function into that. {code:java} MappableBlockLoader loader = MappableBlockLoaderFactory.getInstance().createCacheLoader(this.getDnConf()); {code} # Typo - '{{due to unsuccessfully mapping'}} -->to-> '{{due to unsuccessful mapping'}}. # Can we make synchronized functions {{long release}} and {{public String getCachePath}} # {{maxBytes = pmemDir.getTotalSpace();}}, IMHO, to use [File#getUsableSpace()|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getUsableSpace()] function. # Remove unused var in PmemVolumeManager.java - {{// private final UsedBytesCount usedBytesCount;}} # Its good to use {} instead of string concatenation in log messages. Please take care all such occurrences in newly writing code. {code:java} LOG.info("Added persistent memory - " + volumes[n] + " with size=" + maxBytes); to LOG.info("Added persistent memory - {} with size={}", volumes[n], maxBytes); {code} > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, > HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, > HDFS-14401.005.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM
[ https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814292#comment-16814292 ] Rakesh R commented on HDFS-14401: - Volume management at the datanode uses java Files APIs and do managing the mount paths. Similar to that, this feature also have multiple {{pmem.dirs}} supported, to make it simple {{pmem}} can also follow the same pattern. This will make the basic pmem configuration easy and user can enable the feature with less efforts. Later, if there is a need of special shared deployment (same mount path used by HDFS cache and some other apps) then we can provide advance configuration per pmem volume management complexity. Can use java file APIs like, {code:java} java.io.File.getTotalSpace(); java.io.File.getFreeSpace(); java.io.File.getUsableSpace(); {code} [Hadoop code reference: DF.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DF.java#L83] > Refine the implementation for HDFS cache on SCM > --- > > Key: HDFS-14401 > URL: https://issues.apache.org/jira/browse/HDFS-14401 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14401.000.patch > > > In this Jira, we will refine the implementation for HDFS cache on SCM, such > as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume > selection impl; 3) Clean up MapppableBlockLoader interface; etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804220#comment-16804220 ] Rakesh R commented on HDFS-14355: - [~PhiloHe], HDFS-14393 sub-task has been resolved, please rebase your patch based on the interface changes. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, > HDFS-14355.005.patch, HDFS-14355.006.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Resolution: Fixed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) I have committed to trunk. Thanks [~umamaheswararao] and [~PhiloHe] for the reviews! > Refactor FsDatasetCache for SCM cache implementation > > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, > HDFS-14393-003.patch > > > This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and > PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804173#comment-16804173 ] Rakesh R commented on HDFS-14393: - Thank you [~umamaheswararao] and [~PhiloHe] for the feedback. I have updated the Jira title. I will commit this shortly. > Refactor FsDatasetCache for SCM cache implementation > > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, > HDFS-14393-003.patch > > > This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and > PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Summary: Refactor FsDatasetCache for SCM cache implementation (was: Move stats related methods to MappableBlockLoader) > Refactor FsDatasetCache for SCM cache implementation > > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, > HDFS-14393-003.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Description: This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. (was: This jira sub-task is to move stats related methods to specific loader and make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.) > Refactor FsDatasetCache for SCM cache implementation > > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, > HDFS-14393-003.patch > > > This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and > PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803830#comment-16803830 ] Rakesh R commented on HDFS-14393: - Thanks [~umamaheswararao] for the reviews. Thanks [~PhiloHe] for bringing the point - 'both lazy writer and read cache are sharing {{MemoryCacheStats}} statistics'. I have uploaded another patch addressing the same. > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, > HDFS-14393-003.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Attachment: HDFS-14393-003.patch > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, > HDFS-14393-003.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803612#comment-16803612 ] Rakesh R edited comment on HDFS-14355 at 3/28/19 6:12 AM: -- Adding review comments, please take care. # How about adding an API to interface {{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It can return specific flag value to differentiate NVMe/DRAM based cache. {code:java} public boolean isPmemCacheEnabled() { return mappableBlockLoader instanceof PmemMappableBlockLoader; } {code} # I'd like to avoid type casting. It won't work with another Pmem implementation, right? {code:java} public String getReplicaCachePath(String bpid, long blockId) { if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) { return null; } ExtendedBlockId key = new ExtendedBlockId(blockId, bpid); String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager() .getCachedFilePath(key); return cachePath; } {code} # Below type casting can be replaced with HDFS-14393 interface. {code:java} /** * Get cache capacity of persistent memory. * TODO: advertise this metric to NameNode by FSDatasetMBean */ public long getPmemCacheCapacity() { if (isPmemCacheEnabled()) { return ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager().getPmemCacheCapacity(); } return 0; } public long getPmemCacheUsed() { if (isPmemCacheEnabled()) { return ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager().getPmemCacheUsed(); } return 0; } {code} # {{FsDatasetUtil#deleteMappedFile}} - try-catch is not required, can we do like, {code:java} public static void deleteMappedFile(String filePath) throws IOException { boolean result = Files.deleteIfExists(Paths.get(filePath)); if (!result) { throw new IOException("Failed to delete the mapped file: " + filePath); } } {code} # Why cant't we avoid {{LocalReplica}} changes and read directly from Util like below, {code:java} FsDatasetImpl#getBlockInputStreamWithCheckingPmemCache() . if (cachePath != null) { return FsDatasetUtil.getInputStreamAndSeek(new File(cachePath), seekOffset); } {code} # As the class {{PmemVolumeManager}} itself represents {{Pmem}} so its good to remove this extra keyword from the methods and entities from this class - PmemUsedBytesCount, getPmemCacheUsed, getPmemCacheCapacity etc.. # Please avoid unchecked conversion and we can do like, {code:java} PmemVolumeManager.java private final Map blockKeyToVolume = new ConcurrentHashMap<>(); Map getBlockKeyToVolume() { return blockKeyToVolume; } {code} # Add exception message in PmemVolumeManager#verifyIfValidPmemVolume {code:java} if (out == null) { throw new IOException(); } {code} # Here {{IOException}} clause is not required, please remove it. We can add it later, if needed. {code:java} MappableBlock.java void afterCache() throws IOException; FsDatasetCache.java try { mappableBlock.afterCache(); } catch (IOException e) { LOG.warn(e.getMessage()); return; } {code} # Can we include block id into the log message, that would improve debugging. {code:java} LOG.info("Successfully cache one replica into persistent memory: " + "[path=" + filePath + ", length=" + length + "]"); to LOG.info("Successfully cached one replica:{} into persistent memory" + ", [cached path={}, length={}]", key, filePath, length); {code} was (Author: rakeshr): Adding review comments, please take care. # How about adding an API to interface {{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It can return specific flag value to differentiate NVMe/DRAM based cache. {code:java} public boolean isPmemCacheEnabled() { return mappableBlockLoader instanceof PmemMappableBlockLoader; } {code} # I'd like to avoid type casting. It won't work with another Pmem implementation, right? {code:java} public String getReplicaCachePath(String bpid, long blockId) { if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) { return null; } ExtendedBlockId key = new ExtendedBlockId(blockId, bpid); String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager() .getCachedFilePath(key); return cachePath; } {code} # Below type casting can be replaced with HDFS-14393 interface. {code:java} /** * Get cache capacity of persistent memory. * TODO: advertise this metric to NameNode by FSDatasetMBean */ public long getPmemCacheCapacity() { if (isPmemCacheEnabled()) { return
[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803612#comment-16803612 ] Rakesh R commented on HDFS-14355: - Adding review comments, please take care. # How about adding an API to interface {{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It can return specific flag value to differentiate NVMe/DRAM based cache. {code:java} public boolean isPmemCacheEnabled() { return mappableBlockLoader instanceof PmemMappableBlockLoader; } {code} # I'd like to avoid type casting. It won't work with another Pmem implementation, right? {code:java} public String getReplicaCachePath(String bpid, long blockId) { if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) { return null; } ExtendedBlockId key = new ExtendedBlockId(blockId, bpid); String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager() .getCachedFilePath(key); return cachePath; } {code} # Below type casting can be replaced with HDFS-14393 interface. {code:java} /** * Get cache capacity of persistent memory. * TODO: advertise this metric to NameNode by FSDatasetMBean */ public long getPmemCacheCapacity() { if (isPmemCacheEnabled()) { return ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager().getPmemCacheCapacity(); } return 0; } public long getPmemCacheUsed() { if (isPmemCacheEnabled()) { return ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager().getPmemCacheUsed(); } return 0; } {code} # {{FsDatasetUtil#deleteMappedFile}} - try-catch is not required, can we do like, {code:java} public static void deleteMappedFile(String filePath) throws IOException { boolean result = Files.deleteIfExists(Paths.get(filePath)); if (!result) { throw new IOException("Failed to delete the mapped file: " + filePath); } } {code} # Why cant't we avoid {{LocalReplica}} changes and read directly from Util like below, {code:java} FsDatasetImpl#getBlockInputStreamWithCheckingPmemCache() . if (cachePath != null) { return FsDatasetUtil.getInputStreamAndSeek(new File(cachePath), seekOffset); } {code} # As the class {{PmemVolumeManager}} itself represents {{Pmem}} so its good to remove this extra keyword from the methods and entities from this class - PmemUsedBytesCount, getPmemCacheUsed, getPmemCacheCapacity etc.. # Please avoid unchecked conversion and we can do like, {code:java} PmemVolumeManager.java private final Map blockKeyToVolume = new ConcurrentHashMap<>(); Map getBlockKeyToVolume() { return blockKeyToVolume; } {code} # Add exception message in PmemVolumeManager#verifyIfValidPmemVolume {code:java} if (out == null) { throw new IOException(); } {code} # Here {{IOException}} clause is not required, please remove it. We can add it later, if needed. {code:java} MappableBlock.java void afterCache() throws IOException; FsDatasetCache.java try { mappableBlock.afterCache(); } catch (IOException e) { LOG.warn(e.getMessage()); return; } {code} # Can we include block id into the log message, that would improve debugging. {code:java} LOG.info("Successfully cache one replica into persistent memory: " + "[path=" + filePath + ", length=" + length + "]"); to LOG.info("Successfully cached one replica:{} into persistent memory" + ", [cached path={}, length={}]", key, filePath, length); {code} > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, > HDFS-14355.005.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803554#comment-16803554 ] Rakesh R edited comment on HDFS-14393 at 3/28/19 3:05 AM: -- Thanks [~umamaheswararao] for the review comments. I have attached another patch addressing the same. Note: I have moved TestFsDatasetCache to {{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl}} to avoid making {{CacheStats}} public. was (Author: rakeshr): Thanks [~umamaheswararao] for the review comments. I have attached another patch addressing the same. > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Attachment: HDFS-14393-002.patch > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803554#comment-16803554 ] Rakesh R commented on HDFS-14393: - Thanks [~umamaheswararao] for the review comments. I have attached another patch addressing the same. > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803307#comment-16803307 ] Rakesh R commented on HDFS-14355: - Thanks [~PhiloHe] for the updates. I have created HDFS-14393 to make {{FsDatasetCache}} more cleaner and interacts with loader, this would avoid the specific type casting to {{PMem}} in the current patch. I will continue reviewing your patch. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, > HDFS-14355.005.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803306#comment-16803306 ] Rakesh R commented on HDFS-14393: - [~umamaheswararao], [~PhiloHe] I have attached the patch by moving {{FsDatasetCache}} memory related methods to {{MemoryMappableBlockLoader}} implementation. Please review. > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Status: Patch Available (was: Open) > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader
[ https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14393: Attachment: HDFS-14393-001.patch > Move stats related methods to MappableBlockLoader > - > > Key: HDFS-14393 > URL: https://issues.apache.org/jira/browse/HDFS-14393 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Attachments: HDFS-14393-001.patch > > > This jira sub-task is to move stats related methods to specific loader and > make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14393) Move stats related methods to MappableBlockLoader
Rakesh R created HDFS-14393: --- Summary: Move stats related methods to MappableBlockLoader Key: HDFS-14393 URL: https://issues.apache.org/jira/browse/HDFS-14393 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This jira sub-task is to move stats related methods to specific loader and make FsDatasetCache more cleaner to plugin DRAM and PMem implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797940#comment-16797940 ] Rakesh R commented on HDFS-14355: - {quote} FileMappableBlockLoader: Actually this class implementing the block loading for pmem. So, should this name say PmemFileMappableBlockLoader/PmemMappableBlockLoader? HDFS-14356 impl may name that implementation class name as NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make sense ? {quote} {{PmemMappableBlockLoader}} and {{NativePmemMappableBlockLoader}} looks good to me. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794924#comment-16794924 ] Rakesh R commented on HDFS-14355: - {quote}This property specifies the cache capactiy for both memory & pmem. We kept same behavior upon the specified cache capacity for pmem cache as that for memory cache. {quote} Please look at my above comment#8. As we know the existing code deals with only the OS page cache, but now adding pmem as well and requires special intelligence to manage the stats/overflows if we allow to plug in two entities together. Just a quick thought is, to add new configuration {{dfs.datanode.cache.pmem.capacity}} and reserve/release logic can be moved to specific MappableBlockLoader's. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794827#comment-16794827 ] Rakesh R edited comment on HDFS-14355 at 3/18/19 8:31 AM: -- Thanks [~PhiloHe] for the good progress. Adding second set of review comments, please go through it. # Close {{file = new RandomAccessFile(filePath, "rw");}} {code:java} IOUtils.closeQuietly(file); {code} # Looks like unused code, please remove it. {code:java} private FsDatasetImpl dataset; public MemoryMappableBlockLoader(FsDatasetImpl dataset) { this.dataset = dataset; } {code} # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not required, please remove it. If you still need this for some purpose, then please add message arg to {{IOException("Failed to parse persistent memory location " + location, e)}} {code:java} } catch (IllegalArgumentException e) { LOG.error("Failed to parse persistent memory location " + location + " for " + e.getMessage()); throw new IOException(e); } {code} # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add exception message arg to {{throw new IOException(t);}} {code:java} throw new IOException( "Exception while writing data to persistent storage dir: " + pmemDir, t); {code} # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the exception message. {code:java} if (out == null) { throw new IOException("Fail to map the block " + blockFileName + " to persistent storage."); } {code} # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock {code:java} throw new IOException( "checksum verification failed for the blockfile:" + blockFileName + ": premature EOF"); {code} # FileMappedBlock#afterCache. Suppressing exception may give wrong statistics, right? Assume, {{afterCache}} throws exception and not cached the file path. Here, the cached block won't be readable but unnecessarily consumes space. How about moving {{mappableBlock.afterCache();}} call right after {{mappableBlockLoader.load()}} function and add throws IOException clause to {{afterCache}} ? {code:java} LOG.warn("Fail to find the replica file of PoolID = " + key.getBlockPoolId() + ", BlockID = " + key.getBlockId() + " for :" + e.getMessage()); {code} # FsDatasetCache.java : reserve() and release() OS page size math is not required in FileMappedBlock. Appreciate if you could avoid these calls. Also, can you re-visit the caching and un-caching logic(for example, datanode.getMetrics() updates etc ) present in this class. {code:java} CachingTask#run(){ long newUsedBytes = reserve(length); ... if (reservedBytes) { release(length); } UncachingTask#run() { ... long newUsedBytes = release(value.mappableBlock.getLength()); {code} # I have changed jira status and triggered QA. Please fix checkstyle warnings and test case failures. Also, can you uncomment {{Test//(timeout=12)}} two occurrences in the test. was (Author: rakeshr): Thanks [~PhiloHe] for the good progress. Adding second set of review comments, please go through it. # Close {{file = new RandomAccessFile(filePath, "rw");}} {code:java} IOUtils.closeQuietly(file); {code} # Looks like unused code, please remove it. {code:java} private FsDatasetImpl dataset; public MemoryMappableBlockLoader(FsDatasetImpl dataset) { this.dataset = dataset; } {code} # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not required, please remove it. If you still need this for some purpose, then please add message arg to {{IOException("Failed to parse persistent memory location " + location, e)}} {code:java} } catch (IllegalArgumentException e) { LOG.error("Failed to parse persistent memory location " + location + " for " + e.getMessage()); throw new IOException(e); } {code} # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add exception message arg to {{throw new IOException(t);}} {code:java} throw new IOException( "Exception while writing data to persistent storage dir: " + pmemDir, t); {code} # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the exception message. {code:java} if (out == null) { throw new IOException("Fail to map the block " + blockFileName + " to persistent storage."); } {code} # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock {code:java} throw new IOException( "checksum verification failed for the blockfile:" +
[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794827#comment-16794827 ] Rakesh R commented on HDFS-14355: - Thanks [~PhiloHe] for the good progress. Adding second set of review comments, please go through it. # Close {{file = new RandomAccessFile(filePath, "rw");}} {code:java} IOUtils.closeQuietly(file); {code} # Looks like unused code, please remove it. {code:java} private FsDatasetImpl dataset; public MemoryMappableBlockLoader(FsDatasetImpl dataset) { this.dataset = dataset; } {code} # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not required, please remove it. If you still need this for some purpose, then please add message arg to {{IOException("Failed to parse persistent memory location " + location, e)}} {code:java} } catch (IllegalArgumentException e) { LOG.error("Failed to parse persistent memory location " + location + " for " + e.getMessage()); throw new IOException(e); } {code} # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add exception message arg to {{throw new IOException(t);}} {code:java} throw new IOException( "Exception while writing data to persistent storage dir: " + pmemDir, t); {code} # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the exception message. {code:java} if (out == null) { throw new IOException("Fail to map the block " + blockFileName + " to persistent storage."); } {code} # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock {code:java} throw new IOException( "checksum verification failed for the blockfile:" + blockFileName + ": premature EOF"); {code} # FileMappedBlock#afterCache. Suppressing exception may give wrong statistics, right? Assume, {{afterCache}} throws exception and not cached the file path. Here, the cached block won't be readable but unnecessarily consumes space. How about moving {{mappableBlock.afterCache();}} call right after {{mappableBlockLoader.load()}} function and add throws IOException clause to {{afterCache}} ? {code:java} LOG.warn("Fail to find the replica file of PoolID = " + key.getBlockPoolId() + ", BlockID = " + key.getBlockId() + " for :" + e.getMessage()); {code} # FsDatasetCache.java : reserve() and release() OS page size math is not required in FileMappedBlock. Appreciate if you could avoid these calls. Also, can you re-visit the caching and un-caching logic(for example, datanode.getMetrics() updates etc ) present in this class. {code:java} CachingTask#run(){ long newUsedBytes = reserve(length); ... if (reservedBytes) { release(length); } UncachingTask#run() { ... long newUsedBytes = release(value.mappableBlock.getLength()); {code} # I have changed jira status and triggered QA. Please fix checkstyle warnings and test case failures. Also, can you uncomment {{Test//(timeout=12)}} two occurrences in the test. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reopened HDFS-14355: - > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R resolved HDFS-14355. - Resolution: Unresolved > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14355: Status: Patch Available (was: Reopened) > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14355) Implement SCM cache using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792605#comment-16792605 ] Rakesh R commented on HDFS-14355: - Thanks [~PhiloHe] for the incremental patch. Following are few quick comments, I will continue reviewing the patch. # Please rename configs: {{dfs.datanode.cache.loader.impl.classname}} to => {{dfs.datanode.cache.loader.class}} {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME}} to => {{DFS_DATANODE_CACHE_LOADER_CLASS}} {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME_DEFAULT}} to => {{DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT}} # Replace the config reading logic like below. Also, this would help avoiding if-else checks : {{if (cacheLoader.equals(MemoryMappableBlockLoader.class.getSimpleName()))}} to determine which class is configured by the user. {code:java} DFSConfigKeys.java public static final String DFS_DATANODE_CACHE_LOADER_CLASS = "dfs.datanode.cache.loader.class"; public static final Class DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT = MemoryMappableBlockLoader.class; You can use the following way to instantiate the cache loader. .. .. this.cacheLoader = getConf().getClass( DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS, DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT, MappableBlockLoader.class); {code} # Add config name into the message {{"The persistent memory volumes are not configured!"}} to => {{"The persistent memory volume, " + DFSConfigKeys.DFS_DATANODE_CACHE_PMEM_DIR_KEY + " is not configured!"}} # Good to Unmaps the block {{mappedoutbuffer}} before deleting the file like below, {code:java} FileMappableBlockLoader.java #verifyIfValidPmemVolume(){ ... ... if (file != null) { IOUtils.closeQuietly(file); NativeIO.POSIX.munmap(out); try { FsDatasetUtil.deleteMappedFile(testFilePath); } catch (IOException e) { LOG.warn("Failed to delete test file " + testFilePath + " from persistent memory", e); } {code} # FileMappableBlockLoader - Please remove the {{assert NativeIO.isAvailable();}} check, its not needed right? # Describe briefly about the filepath string formation pattern '{{PmemDir/BlockPoolId-BlockId'}} either at class or function level javadocs. {code:java} FileMappableBlockLoader#load() filePath = getOneLocation() + "/" + key.getBlockPoolId() + "-" + key.getBlockId(); {code} # Add @VisibleForTesting to {{public static void verifyIfValidPmemVolume(File pmemDir)}} function # Add annotation to the new classes FileMappedBlock, FileMappableBlockLoader. {code:java} @InterfaceAudience.Private @InterfaceStability.Unstable {code} # Comments on TestCacheWithFileMappableBlockLoader: ## Remove MLOCK config, which is not required. {code:java} myConf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY); {code} ## Move the test TestCacheWithFileMappableBlockLoader.java class to {{package org.apache.hadoop.hdfs.server.datanode.fsdataset.impl}}. This will avoid making the class FsDatasetImpl public and infact no changes to FsDatasetImpl class required. > Implement SCM cache using pure java mapped byte buffer > -- > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14355) Implement SCM cache using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792605#comment-16792605 ] Rakesh R edited comment on HDFS-14355 at 3/14/19 12:11 PM: --- Thanks [~PhiloHe] for the incremental patch. Following are few quick comments, I will continue reviewing the patch. # Please rename configs: {{dfs.datanode.cache.loader.impl.classname}} to => {{dfs.datanode.cache.loader.class}} {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME}} to => {{DFS_DATANODE_CACHE_LOADER_CLASS}} {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME_DEFAULT}} to => {{DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT}} # Replace the config reading logic like below. Also, this would help avoiding if-else checks : {{if (cacheLoader.equals(MemoryMappableBlockLoader.class.getSimpleName()))}} to determine which class is configured by the user. {code} DFSConfigKeys.java public static final String DFS_DATANODE_CACHE_LOADER_CLASS = "dfs.datanode.cache.loader.class"; public static final Class DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT = MemoryMappableBlockLoader.class; You can use the following way to instantiate the cache loader. .. .. this.cacheLoader = getConf().getClass( DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS, DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT, MappableBlockLoader.class); {code} # Add config name into the message {{"The persistent memory volumes are not configured!"}} to => {{"The persistent memory volume, " + DFSConfigKeys.DFS_DATANODE_CACHE_PMEM_DIR_KEY + " is not configured!"}} # Good to Unmaps the block {{mappedoutbuffer}} before deleting the file like below, {code} FileMappableBlockLoader.java #verifyIfValidPmemVolume(){ ... ... if (file != null) { IOUtils.closeQuietly(file); NativeIO.POSIX.munmap(out); try { FsDatasetUtil.deleteMappedFile(testFilePath); } catch (IOException e) { LOG.warn("Failed to delete test file " + testFilePath + " from persistent memory", e); } {code} # FileMappableBlockLoader - Please remove the {{assert NativeIO.isAvailable();}} check, its not needed right? # Describe briefly about the filepath string formation pattern {{'PmemDir/BlockPoolId-BlockId'}} either at class or function level javadocs. {code} FileMappableBlockLoader#load() filePath = getOneLocation() + "/" + key.getBlockPoolId() + "-" + key.getBlockId(); {code} # Add @VisibleForTesting to {{public static void verifyIfValidPmemVolume(File pmemDir)}} function # Add annotation to the new classes FileMappedBlock, FileMappableBlockLoader. {code} @InterfaceAudience.Private @InterfaceStability.Unstable {code} # Comments on TestCacheWithFileMappableBlockLoader: ## Remove MLOCK config, which is not required. {code} myConf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY); {code} ## Move the test TestCacheWithFileMappableBlockLoader.java class to {{package org.apache.hadoop.hdfs.server.datanode.fsdataset.impl}}. This will avoid making the class FsDatasetImpl public and infact no changes to FsDatasetImpl class required. was (Author: rakeshr): Thanks [~PhiloHe] for the incremental patch. Following are few quick comments, I will continue reviewing the patch. # Please rename configs: {{dfs.datanode.cache.loader.impl.classname}} to => {{dfs.datanode.cache.loader.class}} {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME}} to => {{DFS_DATANODE_CACHE_LOADER_CLASS}} {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME_DEFAULT}} to => {{DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT}} # Replace the config reading logic like below. Also, this would help avoiding if-else checks : {{if (cacheLoader.equals(MemoryMappableBlockLoader.class.getSimpleName()))}} to determine which class is configured by the user. {code:java} DFSConfigKeys.java public static final String DFS_DATANODE_CACHE_LOADER_CLASS = "dfs.datanode.cache.loader.class"; public static final Class DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT = MemoryMappableBlockLoader.class; You can use the following way to instantiate the cache loader. .. .. this.cacheLoader = getConf().getClass( DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS, DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT, MappableBlockLoader.class); {code} # Add config name into the message {{"The persistent memory volumes are not configured!"}} to => {{"The persistent memory volume, " + DFSConfigKeys.DFS_DATANODE_CACHE_PMEM_DIR_KEY + " is not configured!"}} # Good to Unmaps the block {{mappedoutbuffer}} before deleting the file like below, {code:java} FileMappableBlockLoader.java #verifyIfValidPmemVolume(){ ... ... if (file != null) { IOUtils.closeQuietly(file);
[jira] [Updated] (HDFS-14355) Implement SCM cache using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14355: Summary: Implement SCM cache using pure java mapped byte buffer (was: Implement SCM cache by using pure java mapped byte buffer) > Implement SCM cache using pure java mapped byte buffer > -- > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14355) Implement SCM cache by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14355: Description: This task is to implement the caching to persistent memory using pure {{java.nio.MappedByteBuffer}}, which could be useful in case native support isn't available or convenient in some environments or platforms. > Implement SCM cache by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14355) Implement SCM cache by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14355: Summary: Implement SCM cache by using pure java mapped byte buffer (was: Implement SCM cache by using mapped byte buffer without PMDK dependency) > Implement SCM cache by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14354) Refactor MappableBlock to align with the implementation of SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789632#comment-16789632 ] Rakesh R commented on HDFS-14354: - Good work, [~PhiloHe]. I've submitted patch to get QA report. Please take care {{checkstyle}} warnings. Adding few comments on patch: # Please add javadoc to {{MappableBlockLoader}} abstract class. # MappableBlockLoader.java - '{{mmap and mlock the block, and then verify its checksum'}} please change it to '{{mmap the block, and then verify its checksum'}} as mlock is very specific to Memory. # FsDatasetCache.java - Please remove the unused method. Will add this later when you add unit testcases and that time will review the necessity of exposing this. {code:java} +import com.google.common.annotations.VisibleForTesting; + + @VisibleForTesting + public MappableBlockLoader getMappableBlockLoader() { + return mappableBlockLoader; + } {code} # Any specific reason to remove the {{return;}} statement. If not, please keep the existing behavior. {code:java} } catch (ChecksumException e) { // Exception message is bogus since this wasn't caused by a file read LOG.warn("Failed to cache " + key + ": checksum verification failed."); - return; {code} > Refactor MappableBlock to align with the implementation of SCM cache > > > Key: HDFS-14354 > URL: https://issues.apache.org/jira/browse/HDFS-14354 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14354.000.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14354) Refactor MappableBlock to align with the implementation of SCM cache
[ https://issues.apache.org/jira/browse/HDFS-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-14354: Status: Patch Available (was: Open) > Refactor MappableBlock to align with the implementation of SCM cache > > > Key: HDFS-14354 > URL: https://issues.apache.org/jira/browse/HDFS-14354 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14354.000.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives
[ https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760449#comment-16760449 ] Rakesh R edited comment on HDFS-13762 at 2/5/19 4:46 AM: - Thanks [~PhiloHe] for the continuous efforts! {quote}Yes, it's a limitation for all the volumes together. {quote} OK, it would be concern when there are different sizes volumes. Agreed to take up in a follow up jira task. {quote}We maintains the map from block file to cache-file on pmem storage, so no need to search for it among the volumes {quote} Great! {quote}We don't have a evict logic here or throw a VolumeFullException, it just works like the memory-cache for compatible now. {quote} Please add this case to the system test plan. It would be good to test it and document/know the behavior. {quote}(iii), we will update the patch accordingly {quote} Please change below one also. {code:java} + Pmem.unmapBlock(region.getAddress(), region.getLength()); + boolean deled = false; {code} Adding few more comments, please take care: 1). MappableBlock interface looks good, please add javadocs for the below functions. Also, please reflect specific class MappableBlock, PmemMappedBlock, MemoryMappedBlock responsibility in javadoc instead of using the same "Represents an HDFS block that is mmapped by the DataNode." {code:java} long getLength(); void afterCache(); {code} 2). Change the comment to PMDK instead of ISA-L! {code:java} + // Load Intel ISA-L + #ifdef UNIX {code} 3). Could you please brief the difference between {{pmemDrain}} and {{pmemSync}}. Also, I would appreciate if you could add javadoc so that the functionality will be visible to the readers. {code:java} private static native boolean isPmemCheck(long address, long length); private static native PmemMappedRegion pmemCreateMapFile(String path, long length); private static native boolean pmemUnMap(long address, long length); private static native void pmemCopy(byte[] src, long dest, boolean isPmem, long length); private static native void pmemDrain(); private static native void pmemSync(long address, long length); {code} was (Author: rakeshr): Thanks [~PhiloHe] for the continuous efforts! {quote}Yes, it's a limitation for all the volumes together. {quote} OK, it would be concern when there are different sizes volumes. Agreed to take up in a follow up jira task. {quote}We maintains the map from block file to cache-file on pmem storage, so no need to search for it among the volumes {quote} Great! {quote}We don't have a evict logic here or throw a VolumeFullException, it just works like the memory-cache for compatible now. {quote} Please add this case to the system test plan. It would be good to test it and document/know the behavior. {quote}(iii), we will update the patch accordingly {quote} Please change below one also. {code:java} + Pmem.unmapBlock(region.getAddress(), region.getLength()); + boolean deled = false; {code} Adding few more comments, please take care: # MappableBlock interface looks good, please add javadocs for the below functions. Also, please reflect specific class MappableBlock, PmemMappedBlock, MemoryMappedBlock responsibility in javadoc instead of using the same "Represents an HDFS block that is mmapped by the DataNode." {code:java} long getLength(); void afterCache(); {code} # Change the comment to PMDK instead of ISA-L! {code:java} + // Load Intel ISA-L + #ifdef UNIX {code} # Could you please brief the difference between {{pmemDrain}} and {{pmemSync}}. Also, I would appreciate if you could add javadoc so that the functionality will be visible to the readers. {code:java} private static native boolean isPmemCheck(long address, long length); private static native PmemMappedRegion pmemCreateMapFile(String path, long length); private static native boolean pmemUnMap(long address, long length); private static native void pmemCopy(byte[] src, long dest, boolean isPmem, long length); private static native void pmemDrain(); private static native void pmemSync(long address, long length); {code} > Support non-volatile storage class memory(SCM) in HDFS cache directives > --- > > Key: HDFS-13762 > URL: https://issues.apache.org/jira/browse/HDFS-13762 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Sammi Chen >Assignee: Feilong He >Priority: Major > Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, > HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, > HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, > HDFS-13762.008.patch, SCMCacheDesign-2018-11-08.pdf,
[jira] [Commented] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives
[ https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760449#comment-16760449 ] Rakesh R commented on HDFS-13762: - Thanks [~PhiloHe] for the continuous efforts! {quote}Yes, it's a limitation for all the volumes together. {quote} OK, it would be concern when there are different sizes volumes. Agreed to take up in a follow up jira task. {quote}We maintains the map from block file to cache-file on pmem storage, so no need to search for it among the volumes {quote} Great! {quote}We don't have a evict logic here or throw a VolumeFullException, it just works like the memory-cache for compatible now. {quote} Please add this case to the system test plan. It would be good to test it and document/know the behavior. {quote}(iii), we will update the patch accordingly {quote} Please change below one also. {code:java} + Pmem.unmapBlock(region.getAddress(), region.getLength()); + boolean deled = false; {code} Adding few more comments, please take care: # MappableBlock interface looks good, please add javadocs for the below functions. Also, please reflect specific class MappableBlock, PmemMappedBlock, MemoryMappedBlock responsibility in javadoc instead of using the same "Represents an HDFS block that is mmapped by the DataNode." {code:java} long getLength(); void afterCache(); {code} # Change the comment to PMDK instead of ISA-L! {code:java} + // Load Intel ISA-L + #ifdef UNIX {code} # Could you please brief the difference between {{pmemDrain}} and {{pmemSync}}. Also, I would appreciate if you could add javadoc so that the functionality will be visible to the readers. {code:java} private static native boolean isPmemCheck(long address, long length); private static native PmemMappedRegion pmemCreateMapFile(String path, long length); private static native boolean pmemUnMap(long address, long length); private static native void pmemCopy(byte[] src, long dest, boolean isPmem, long length); private static native void pmemDrain(); private static native void pmemSync(long address, long length); {code} > Support non-volatile storage class memory(SCM) in HDFS cache directives > --- > > Key: HDFS-13762 > URL: https://issues.apache.org/jira/browse/HDFS-13762 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Sammi Chen >Assignee: Feilong He >Priority: Major > Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, > HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, > HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, > HDFS-13762.008.patch, SCMCacheDesign-2018-11-08.pdf, SCMCacheTestPlan.pdf > > > No-volatile storage class memory is a type of memory that can keep the data > content after power failure or between the power cycle. Non-volatile storage > class memory device usually has near access speed as memory DIMM while has > lower cost than memory. So today It is usually used as a supplement to > memory to hold long tern persistent data, such as data in cache. > Currently in HDFS, we have OS page cache backed read only cache and RAMDISK > based lazy write cache. Non-volatile memory suits for both these functions. > This Jira aims to enable storage class memory first in read cache. Although > storage class memory has non-volatile characteristics, to keep the same > behavior as current read only cache, we don't use its persistent > characteristics currently. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives
[ https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733867#comment-16733867 ] Rakesh R commented on HDFS-13762: - [~zhouwei], [~Sammi], Great proposal, thanks for your work. [i) Few clarifications about multiple {{pmemVolumes}} configuration: # Is the MaxLockedMemory limit applicable to all the "pmemVolumes" together? # IIUC, you have mentioned about robin policy to choose a directory for each new DNA_CACHE command. I'd like to understand about the look up, will it maintain any indexing or just do random search and if it doesn't find the item in a volume then search through next volume? # Is there any automatic eviction logic available once volume reaches threshold or will it throw VolumeFullException to the users? (ii) Please add "{{dfs.datanode.cache.pmem.dirs}}" to hdfs-default.xml config doc file, that would make 'TestHdfsConfigFields' test happy. (iii) Typo: {{boolean deled = false; ==> boolean deleted = false;}} > Support non-volatile storage class memory(SCM) in HDFS cache directives > --- > > Key: HDFS-13762 > URL: https://issues.apache.org/jira/browse/HDFS-13762 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Sammi Chen >Assignee: Wei Zhou >Priority: Major > Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, > HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, > HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, > SCMCacheDesign-2018-11-08.pdf, SCMCacheTestPlan.pdf > > > No-volatile storage class memory is a type of memory that can keep the data > content after power failure or between the power cycle. Non-volatile storage > class memory device usually has near access speed as memory DIMM while has > lower cost than memory. So today It is usually used as a supplement to > memory to hold long tern persistent data, such as data in cache. > Currently in HDFS, we have OS page cache backed read only cache and RAMDISK > based lazy write cache. Non-volatile memory suits for both these functions. > This Jira aims to enable storage class memory first in read cache. Although > storage class memory has non-volatile characteristics, to keep the same > behavior as current read only cache, we don't use its persistent > characteristics currently. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623449#comment-16623449 ] Rakesh R commented on HDFS-10285: - I've updated release notes and resolved this jira. Thank you very much to all contributors for your time, efforts and useful discussions in making this feature! > Storage Policy Satisfier in HDFS > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS SPS Test Report-31July2018-v1.pdf, > HDFS-10285-consolidated-merge-patch-00.patch, > HDFS-10285-consolidated-merge-patch-01.patch, > HDFS-10285-consolidated-merge-patch-02.patch, > HDFS-10285-consolidated-merge-patch-03.patch, > HDFS-10285-consolidated-merge-patch-04.patch, > HDFS-10285-consolidated-merge-patch-05.patch, > HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, > Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, > Storage-Policy-Satisfier-in-HDFS-May10.pdf, > Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf > > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10285: Resolution: Fixed Release Note: StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path ” command or via HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in hdfs-site.xml. The configs can be disabled dynamically without restarting Namenode. SPS should be started outside Namenode using "hdfs --daemon start sps". If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage. Status: Resolved (was: Patch Available) > Storage Policy Satisfier in HDFS > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS SPS Test Report-31July2018-v1.pdf, > HDFS-10285-consolidated-merge-patch-00.patch, > HDFS-10285-consolidated-merge-patch-01.patch, > HDFS-10285-consolidated-merge-patch-02.patch, > HDFS-10285-consolidated-merge-patch-03.patch, > HDFS-10285-consolidated-merge-patch-04.patch, > HDFS-10285-consolidated-merge-patch-05.patch, > HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, > Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, > Storage-Policy-Satisfier-in-HDFS-May10.pdf, > Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf > > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10285: Fix Version/s: HDFS-10285 > Storage Policy Satisfier in HDFS > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS SPS Test Report-31July2018-v1.pdf, > HDFS-10285-consolidated-merge-patch-00.patch, > HDFS-10285-consolidated-merge-patch-01.patch, > HDFS-10285-consolidated-merge-patch-02.patch, > HDFS-10285-consolidated-merge-patch-03.patch, > HDFS-10285-consolidated-merge-patch-04.patch, > HDFS-10285-consolidated-merge-patch-05.patch, > HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, > Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, > Storage-Policy-Satisfier-in-HDFS-May10.pdf, > Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf > > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12995) [SPS] : Merge work for HDFS-10285 branch
[ https://issues.apache.org/jira/browse/HDFS-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-12995: Fix Version/s: 3.2.0 HDFS-10285 > [SPS] : Merge work for HDFS-10285 branch > > > Key: HDFS-12995 > URL: https://issues.apache.org/jira/browse/HDFS-12995 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-10285-consolidated-merge-patch-01.patch > > > This Jira is to run aggregated HDFS-10285 branch patch against trunk and > check for any jenkins issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13808) [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function
[ https://issues.apache.org/jira/browse/HDFS-13808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13808: Fix Version/s: 3.2.0 HDFS-10285 > [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and > #getFileInfo() function > --- > > Key: HDFS-13808 > URL: https://issues.apache.org/jira/browse/HDFS-13808 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Minor > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-13808-HDFS-10285-00.patch, > HDFS-13808-HDFS-10285-01.patch, HDFS-13808-HDFS-10285-02.patch, > HDFS-13808-HDFS-10285-03.patch, HDFS-13808-HDFS-10285-04.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10885) [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on
[ https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10885: Fix Version/s: 3.2.0 > [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier > is on > -- > > Key: HDFS-10885 > URL: https://issues.apache.org/jira/browse/HDFS-10885 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Wei Zhou >Assignee: Wei Zhou >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-10800-HDFS-10885-00.patch, > HDFS-10800-HDFS-10885-01.patch, HDFS-10800-HDFS-10885-02.patch, > HDFS-10885-HDFS-10285-10.patch, HDFS-10885-HDFS-10285-11.patch, > HDFS-10885-HDFS-10285.03.patch, HDFS-10885-HDFS-10285.04.patch, > HDFS-10885-HDFS-10285.05.patch, HDFS-10885-HDFS-10285.06.patch, > HDFS-10885-HDFS-10285.07.patch, HDFS-10885-HDFS-10285.08.patch, > HDFS-10885-HDFS-10285.09.patch > > > These two can not work at the same time to avoid conflicts and fight with > each other. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11670) [SPS]: Add CLI command for satisfy storage policy operations
[ https://issues.apache.org/jira/browse/HDFS-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11670: Fix Version/s: 3.2.0 > [SPS]: Add CLI command for satisfy storage policy operations > > > Key: HDFS-11670 > URL: https://issues.apache.org/jira/browse/HDFS-11670 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11670-HDFS-10285.001.patch, > HDFS-11670-HDFS-10285.002.patch, HDFS-11670-HDFS-10285.003.patch, > HDFS-11670-HDFS-10285.004.patch, HDFS-11670-HDFS-10285.005.patch > > > This jira to discuss and implement set of satisfy storage policy > sub-commands. Following are the list of sub-commands: > # Schedule blocks to move based on file/directory policy: > {code}hdfs storagepolicies -satisfyStoragePolicy -path ]{code} > # Its good to have one command to check SPS is enabled or not. Based on this > user can take the decision to run the Mover: > {code} > hdfs storagepolicies -isSPSRunning > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
[ https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10794: Fix Version/s: 3.2.0 > [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the > block storage movement work > > > Key: HDFS-10794 > URL: https://issues.apache.org/jira/browse/HDFS-10794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Affects Versions: HDFS-10285 >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch, > HDFS-10794-HDFS-10285.01.patch, HDFS-10794-HDFS-10285.02.patch, > HDFS-10794-HDFS-10285.03.patch > > > The idea of this jira is to implement a mechanism to move the blocks to the > given target in order to satisfy the block storage policy. Datanode receives > {{blocktomove}} details via heart beat response from NN. More specifically, > its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11029) [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs
[ https://issues.apache.org/jira/browse/HDFS-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11029: Fix Version/s: 3.2.0 > [SPS]:Provide retry mechanism for the blocks which were failed while moving > its storage at DNs > -- > > Key: HDFS-11029 > URL: https://issues.apache.org/jira/browse/HDFS-11029 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11029-HDFS-10285-00.patch, > HDFS-11029-HDFS-10285-01.patch, HDFS-11029-HDFS-10285-02.patch > > > When DN co-ordinator finds some of blocks associated to trackedID could not > be moved its storages, due to some errors.Here retry may work in some cases, > example if target node has no space. Then retry by finding another target can > work. > So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator, > NN would retry by scanning the blocks again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11243) [SPS]: Add a protocol command from NN to DN for dropping the SPS work and queues
[ https://issues.apache.org/jira/browse/HDFS-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11243: Fix Version/s: 3.2.0 > [SPS]: Add a protocol command from NN to DN for dropping the SPS work and > queues > - > > Key: HDFS-11243 > URL: https://issues.apache.org/jira/browse/HDFS-11243 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11243-HDFS-10285-00.patch, > HDFS-11243-HDFS-10285-01.patch, HDFS-11243-HDFS-10285-02.patch > > > This JIRA is for adding a protocol command from Namenode to Datanode for > dropping SPS work. and Also for dropping in progress queues. > Use case is: when admin deactivated SPS at NN, then internally NN should > issue a command to DNs for dropping in progress queues as well. This command > can be packed via heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11762) [SPS] : Empty files should be ignored in StoragePolicySatisfier.
[ https://issues.apache.org/jira/browse/HDFS-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11762: Fix Version/s: 3.2.0 > [SPS] : Empty files should be ignored in StoragePolicySatisfier. > - > > Key: HDFS-11762 > URL: https://issues.apache.org/jira/browse/HDFS-11762 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11762-HDFS-10285.001.patch, > HDFS-11762-HDFS-10285.002.patch, HDFS-11762-HDFS-10285.003.patch, > HDFS-11762-HDFS-10285.004.patch > > > File which has zero block should be ignored in SPS. Currently it is throwing > NPE in StoragePolicySatisfier thread. > {noformat} > 2017-05-06 23:29:04,735 [StoragePolicySatisfier] ERROR > namenode.StoragePolicySatisfier (StoragePolicySatisfier.java:run(278)) - > StoragePolicySatisfier thread received runtime exception. Stopping Storage > policy satisfier work > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.analyseBlocksStorageMovementsAndAssignToDN(StoragePolicySatisfier.java:292) > at > org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.run(StoragePolicySatisfier.java:233) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12982) [SPS]: Reduce the locking and cleanup the Namesystem access
[ https://issues.apache.org/jira/browse/HDFS-12982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-12982: Fix Version/s: 3.2.0 > [SPS]: Reduce the locking and cleanup the Namesystem access > --- > > Key: HDFS-12982 > URL: https://issues.apache.org/jira/browse/HDFS-12982 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-12982-HDFS-10285-00.patch, > HDFS-12982-HDFS-10285-01.patch, HDFS-12982-HDFS-10285-02.patch, > HDFS-12982-HDFS-10285-03.patch > > > This task is to optimize the NS lock usage in SPS and cleanup the Namesystem > access via {{Context}} interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12214) [SPS]: Fix review comments of StoragePolicySatisfier feature
[ https://issues.apache.org/jira/browse/HDFS-12214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-12214: Fix Version/s: 3.2.0 > [SPS]: Fix review comments of StoragePolicySatisfier feature > > > Key: HDFS-12214 > URL: https://issues.apache.org/jira/browse/HDFS-12214 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-12214-HDFS-10285-00.patch, > HDFS-12214-HDFS-10285-01.patch, HDFS-12214-HDFS-10285-02.patch, > HDFS-12214-HDFS-10285-03.patch, HDFS-12214-HDFS-10285-04.patch, > HDFS-12214-HDFS-10285-05.patch, HDFS-12214-HDFS-10285-06.patch, > HDFS-12214-HDFS-10285-07.patch, HDFS-12214-HDFS-10285-08.patch > > > This sub-task is to address [~andrew.wang]'s review comments. Please refer > the [review > comment|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16103734=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16103734] > in HDFS-10285 umbrella jira. > # Rename configuration property 'dfs.storage.policy.satisfier.activate' to > 'dfs.storage.policy.satisfier.enabled' > # Disable SPS feature by default. > # Rather than using the acronym (which a user might not know), maybe rename > "-isSpsRunning" to "-isSatisfierRunning" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13033) [SPS]: Implement a mechanism to do file block movements for external SPS
[ https://issues.apache.org/jira/browse/HDFS-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13033: Fix Version/s: 3.2.0 > [SPS]: Implement a mechanism to do file block movements for external SPS > > > Key: HDFS-13033 > URL: https://issues.apache.org/jira/browse/HDFS-13033 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-13033-HDFS-10285-00.patch, > HDFS-13033-HDFS-10285-01.patch, HDFS-13033-HDFS-10285-02.patch, > HDFS-13033-HDFS-10285-03.patch, HDFS-13033-HDFS-10285-04.patch > > > HDFS-12911 modularization is introducing \{{BlockMoveTaskHandler}} interface > for moving the file blocks. That will help us to plugin different ways of > block move mechanisms, if needed. > For Internal SPS, we have simple blk movement tasks to target DN descriptors. > For external SPS, we should have mechanism to send \{{replaceBlock}} on > target node and have a listener to track the block movement completion. > This is the task to implement the \{{ExternalSPSBlockMoveTaskHandler}} plugin > for external SPS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11186) [SPS]: Daemon thread of SPS should start only in Active NN
[ https://issues.apache.org/jira/browse/HDFS-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11186: Fix Version/s: 3.2.0 > [SPS]: Daemon thread of SPS should start only in Active NN > -- > > Key: HDFS-11186 > URL: https://issues.apache.org/jira/browse/HDFS-11186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Wei Zhou >Assignee: Wei Zhou >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11186-HDFS-10285.00.patch, > HDFS-11186-HDFS-10285.01.patch, HDFS-11186-HDFS-10285.02.patch, > HDFS-11186-HDFS-10285.03.patch, HDFS-11186-HDFS-10285.04.patch > > > As discussed in [HDFS-10885 > |https://issues.apache.org/jira/browse/HDFS-10885], we need to ensure that > SPS is started only in Active NN. This JIRA is opened for discussion and > tracking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls
[ https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13166: Fix Version/s: 3.2.0 > [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly > getLiveDatanodeStorageReport() calls > - > > Key: HDFS-13166 > URL: https://issues.apache.org/jira/browse/HDFS-13166 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-13166-HDFS-10285-00.patch, > HDFS-13166-HDFS-10285-01.patch, HDFS-13166-HDFS-10285-02.patch, > HDFS-13166-HDFS-10285-03.patch > > > Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and > does the computation. This Jira sub-task is to discuss and implement a cache > mechanism which in turn reduces the number of function calls. Also, could > define a configurable refresh interval and periodically refresh the DN cache > by fetching latest {{#getLiveDatanodeStorageReport}} on this interval. > Following comments taken from HDFS-10285, > [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472] > Comment-7) > {quote}Adding getDatanodeStorageReport is concerning. > getDatanodeListForReport is already a very bad method that should be avoided > for anything but jmx – even then it’s a concern. I eliminated calls to it > years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn > lock for an excessive length of time. Beyond that, the response is going to > be pretty large and tagging all the storage reports is not going to be cheap. > verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem > lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its > storageMap? > Appears to be calling getLiveDatanodeStorageReport for every file. As > mentioned earlier, this is NOT cheap. The SPS should be able to operate on a > fuzzy/cached state of the world. Then it gets another datanode report to > determine the number of live nodes to decide if it should sleep before > processing the next path. The number of nodes from the prior cached view of > the world should suffice. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable
[ https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11289: Fix Version/s: 3.2.0 > [SPS]: Make SPS movement monitor timeouts configurable > -- > > Key: HDFS-11289 > URL: https://issues.apache.org/jira/browse/HDFS-11289 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11289-HDFS-10285-00.patch, > HDFS-11289-HDFS-10285-01.patch > > > Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for > making it configurable. > {code} > // TODO: below selfRetryTimeout and checkTimeout can be configurable later > // Now, the default values of selfRetryTimeout and checkTimeout are 30mins > // and 5mins respectively > this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems( > 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12556) [SPS] : Block movement analysis should be done in read lock.
[ https://issues.apache.org/jira/browse/HDFS-12556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-12556: Fix Version/s: 3.2.0 > [SPS] : Block movement analysis should be done in read lock. > > > Key: HDFS-12556 > URL: https://issues.apache.org/jira/browse/HDFS-12556 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-12556-HDFS-10285-01.patch, > HDFS-12556-HDFS-10285-02.patch, HDFS-12556-HDFS-10285-03.patch > > > {noformat} > 2017-09-27 15:58:32,852 [StoragePolicySatisfier] ERROR > namenode.StoragePolicySatisfier > (StoragePolicySatisfier.java:handleException(308)) - StoragePolicySatisfier > thread received runtime exception. Stopping Storage policy satisfier work > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getStorages(BlockManager.java:4130) > at > org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.analyseBlocksStorageMovementsAndAssignToDN(StoragePolicySatisfier.java:362) > at > org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.run(StoragePolicySatisfier.java:236) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11572) [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file
[ https://issues.apache.org/jira/browse/HDFS-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11572: Fix Version/s: 3.2.0 > [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file > > > Key: HDFS-11572 > URL: https://issues.apache.org/jira/browse/HDFS-11572 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11572-HDFS-10285-00.patch, > HDFS-11572-HDFS-10285-01.patch > > > When user calls on a file to satisfy storage policy, but that file already > well satisfied. This time, SPS will just scan and make sure no blocks needs > to satisfy and will leave that element. In this case, we are not cleaning > Xattrs. This is the JIRA to make sure we will clean Xattrs in this situation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13057) [SPS]: Revisit configurations to make SPS service modes internal/external/none
[ https://issues.apache.org/jira/browse/HDFS-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13057: Fix Version/s: 3.2.0 > [SPS]: Revisit configurations to make SPS service modes internal/external/none > -- > > Key: HDFS-13057 > URL: https://issues.apache.org/jira/browse/HDFS-13057 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Blocker > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-13057-HDFS-10285-00.patch, > HDFS-13057-HDFS-10285-01.patch, HDFS-13057-HDFS-10285-02.patch > > > This task is to revisit the configurations to make SPS service modes - > {{internal/external/none}} > - {{internal}} : represents SPS service will be running with NN > - {{external}}: represents SPS service will be running outside NN > - {{none}}: represents the SPS service is completely disabled and zero cost > to the system. > Proposed configuration {{dfs.storage.policy.satisfier.mode}} item in > hdfs-site.xml file and value will be string. The mode can be changed via > {{reconfig}} command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN
[ https://issues.apache.org/jira/browse/HDFS-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10884: Fix Version/s: 3.2.0 > [SPS]: Add block movement tracker to track the completion of block movement > future tasks at DN > -- > > Key: HDFS-10884 > URL: https://issues.apache.org/jira/browse/HDFS-10884 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: HDFS-10285 >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-10884-HDFS-10285-00.patch, > HDFS-10884-HDFS-10285-01.patch, HDFS-10884-HDFS-10285-02.patch, > HDFS-10884-HDFS-10285-03.patch, HDFS-10884-HDFS-10285-04.patch, > HDFS-10884-HDFS-10285-05.patch > > > Presently > [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147] > function act as a blocking call. The idea of this jira is to implement a > mechanism to track these movements async so that would allow other movement > while processing the previous one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13075) [SPS]: Provide External Context implementation.
[ https://issues.apache.org/jira/browse/HDFS-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-13075: Fix Version/s: 3.2.0 > [SPS]: Provide External Context implementation. > --- > > Key: HDFS-13075 > URL: https://issues.apache.org/jira/browse/HDFS-13075 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-10285 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-13075-HDFS-10285-0.patch, > HDFS-13075-HDFS-10285-01.patch, HDFS-13075-HDFS-10285-02.patch, > HDFS-13075-HDFS-10285-1.patch > > > This JIRA to provide initial implementation of External Context. > With HDFS-12995, we improve further retry mechanism etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11338) [SPS]: Fix timeout issue in unit tests caused by longger NN down time
[ https://issues.apache.org/jira/browse/HDFS-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11338: Fix Version/s: 3.2.0 > [SPS]: Fix timeout issue in unit tests caused by longger NN down time > - > > Key: HDFS-11338 > URL: https://issues.apache.org/jira/browse/HDFS-11338 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Wei Zhou >Assignee: Rakesh R >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11338-HDFS-10285-02.patch, > HDFS-11338-HDFS-10285-03.patch, HDFS-11338-HDFS-10285-04.patch, > HDFS-11338-HDFS-10285-05.patch, HDFS-11338-HDFS-10285.00.patch, > HDFS-11338-HDFS-10285.01.patch > > > As discussed in HDFS-11186, it takes longer to stop NN: > {code} > try { > storagePolicySatisfierThread.join(3000); > } catch (InterruptedException ie) { > } > {code} > So, it takes longer time to finish some tests and this leads to the timeout > failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11965) [SPS]: Should give chance to satisfy the low redundant blocks before removing the xattr
[ https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-11965: Fix Version/s: 3.2.0 > [SPS]: Should give chance to satisfy the low redundant blocks before removing > the xattr > --- > > Key: HDFS-11965 > URL: https://issues.apache.org/jira/browse/HDFS-11965 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: HDFS-10285 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Fix For: HDFS-10285, 3.2.0 > > Attachments: HDFS-11965-HDFS-10285.001.patch, > HDFS-11965-HDFS-10285.002.patch, HDFS-11965-HDFS-10285.003.patch, > HDFS-11965-HDFS-10285.004.patch, HDFS-11965-HDFS-10285.005.patch, > HDFS-11965-HDFS-10285.006.patch, HDFS-11965-HDFS-10285.007.patch, > HDFS-11965-HDFS-10285.008.patch > > > The test case is failing because all the required replicas are not moved in > expected storage. This is happened because of delay in datanode registration > after cluster restart. > Scenario : > 1. Start cluster with 3 DataNodes. > 2. Create file and set storage policy to WARM. > 3. Restart the cluster. > 4. Now Namenode and two DataNodes started first and got registered with > NameNode. (one datanode not yet registered) > 5. SPS scheduled block movement based on available DataNodes (It will move > one replica in ARCHIVE based on policy). > 6. Block movement also success and Xattr removed from the file because this > condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}. > {code} > if (itemInfo != null > && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) { > blockStorageMovementNeeded > .add(storageMovementAttemptedResult.getTrackId()); > > .. > } else { > > .. > this.sps.postBlkStorageMovementCleanup( > storageMovementAttemptedResult.getTrackId()); > } > {code} > 7. Now third DN registered with namenode and its reported one more DISK > replica. Now Namenode has two DISK and one ARCHIVE replica. > In test case we have condition to check the number of DISK replica.. > {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, > timeout, fs);{code} > This condition never became true and test case will be timed out. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org