[jira] [Updated] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command

2019-09-22 Thread Rakesh R (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14818:

Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~PhiloHe] for the contribution. +1 LGTM.

Committed to trunk

> Check native pmdk lib by 'hadoop checknative' command
> -
>
> Key: HDFS-14818
> URL: https://issues.apache.org/jira/browse/HDFS-14818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: native
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14818.000.patch, HDFS-14818.001.patch, 
> HDFS-14818.002.patch, HDFS-14818.003.patch, HDFS-14818.004.patch, 
> check_native_after_building_with_PMDK.png, 
> check_native_after_building_with_PMDK_using_NAME_instead_of_REALPATH.png, 
> check_native_after_building_without_PMDK.png
>
>
> Currently, 'hadoop checknative' command supports checking native libs, such 
> as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in 
> the checking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives

2019-09-15 Thread Rakesh R (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13762:

Target Version/s: 3.3.0
  Status: Open  (was: Patch Available)

> Support non-volatile storage class memory(SCM) in HDFS cache directives
> ---
>
> Key: HDFS-13762
> URL: https://issues.apache.org/jira/browse/HDFS-13762
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: caching, datanode
>Reporter: Sammi Chen
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, 
> HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, 
> HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, 
> HDFS-13762.008.patch, HDFS_Persistent_Memory_Cache_Perf_Results.pdf, 
> SCMCacheDesign-2018-11-08.pdf, SCMCacheDesign-2019-07-12.pdf, 
> SCMCacheDesign-2019-07-16.pdf, SCMCacheDesign-2019-3-26.pdf, 
> SCMCacheTestPlan-2019-3-27.pdf, SCMCacheTestPlan.pdf
>
>
> No-volatile storage class memory is a type of memory that can keep the data 
> content after power failure or between the power cycle. Non-volatile storage 
> class memory device usually has near access speed as memory DIMM while has 
> lower cost than memory.  So today It is usually used as a supplement to 
> memory to hold long tern persistent data, such as data in cache. 
> Currently in HDFS, we have OS page cache backed read only cache and RAMDISK 
> based lazy write cache.  Non-volatile memory suits for both these functions. 
> This Jira aims to enable storage class memory first in read cache. Although 
> storage class memory has non-volatile characteristics, to keep the same 
> behavior as current read only cache, we don't use its persistent 
> characteristics currently.  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command

2019-09-11 Thread Rakesh R (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928209#comment-16928209
 ] 

Rakesh R edited comment on HDFS-14818 at 9/12/19 5:22 AM:
--

Thanks [~PhiloHe] for the patch. Overall looks good to me, just few comments.
 # {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it.
{code:java}
SupportState.PMDK_LIB_NOT_FOUND(1),
{code}
{code:java}
 case 1:
 msg = "The native code is built with PMDK support, but PMDK libs " +
 "are NOT found in execution environment or failed to be loaded.";
 break;
{code}
# Any reason to change 'NAME' to 'REALPATH'.


was (Author: rakeshr):
Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment.

# {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it.
 {code}
SupportState.PMDK_LIB_NOT_FOUND(1),
{code}
{code}
 case 1:
 msg = "The native code is built with PMDK support, but PMDK libs " +
 "are NOT found in execution environment or failed to be loaded.";
 break;
{code}

> Check native pmdk lib by 'hadoop checknative' command
> -
>
> Key: HDFS-14818
> URL: https://issues.apache.org/jira/browse/HDFS-14818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: native
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14818.000.patch
>
>
> Currently, 'hadoop checknative' command supports checking native libs, such 
> as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in 
> the checking.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14818) Check native pmdk lib by 'hadoop checknative' command

2019-09-11 Thread Rakesh R (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928209#comment-16928209
 ] 

Rakesh R commented on HDFS-14818:
-

Thanks [~PhiloHe] for the patch. Overall looks good to me, just a comment.

# {{SupportState.PMDK_LIB_NOT_FOUND}} - its unused now, can you remove it.
 {code}
SupportState.PMDK_LIB_NOT_FOUND(1),
{code}
{code}
 case 1:
 msg = "The native code is built with PMDK support, but PMDK libs " +
 "are NOT found in execution environment or failed to be loaded.";
 break;
{code}

> Check native pmdk lib by 'hadoop checknative' command
> -
>
> Key: HDFS-14818
> URL: https://issues.apache.org/jira/browse/HDFS-14818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: native
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14818.000.patch
>
>
> Currently, 'hadoop checknative' command supports checking native libs, such 
> as zlib, snappy, openssl and ISA-L etc. It's necessary to include pmdk lib in 
> the checking.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14740) HDFS read cache persistence support

2019-08-28 Thread Rakesh R (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918242#comment-16918242
 ] 

Rakesh R commented on HDFS-14740:
-

Thanks [~Rui Mo] for the contribution. Overall the idea looks good. Added few 
comments, please take care.

# Please remove duplicate checks in #restoreCache() method as you already doing 
the checks inside #createBlockPoolDir().
{code}
#createBlockPoolDir()

if (!cacheDir.exists() && !cacheDir.mkdir()) {
{code}
{code}
#restoreCache()
if (cacheDir.exists()) {
{code}
# {{pmemVolume/BlockPoolId/BlockPoolId-BlockId}}.
{{BlockPoolId}} is duplicated and please remove this from the file name. 
This will avoid {{cachedFile.getName().split("-");}} splitting logic and make 
it simple.
# Can you explore the chances of using hierarchical way of storing blocks 
similar to the existing datanode data.dir, this is to avoid chances of growing 
blocks under one single blockPoolId. Assume cache capacity in TBs and large set 
of data blocks in cache under a blockPool. Please refer 
{{DatanodeUtil.idToBlockDir(finalizedDir, b.getBlockId());}}
# {{restoreCache()}} - How about moving specific parsing/restore logic to 
respective MappableBlockLoaders. PmemMappableBlockLoader#restoreCache() and 
NativePmemMappableBlockLoader#restoreCache().
# {{dfs.datanode.cache.persistence.enabled}} - by default this can be true as 
this will allow to get maximum capabilities of pmem device. Overall the feature 
is disabled and default value of "dfs.datanode.cache.pmem.dirs"  is empty and 
will be DRAM based. So, once the user enables pmem, they can utilize the 
potential of this device and no case of compatibility.

> HDFS read cache persistence support
> ---
>
> Key: HDFS-14740
> URL: https://issues.apache.org/jira/browse/HDFS-14740
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Feilong He
>Assignee: Rui Mo
>Priority: Major
> Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, 
> HDFS-14740.002.patch
>
>
> In HDFS-13762, persistent memory is enabled in HDFS centralized cache 
> management. Even though persistent memory can persist cache data, for 
> simplifying the implementation, the previous cache data will be cleaned up 
> during DataNode restarts. We propose to improve HDFS persistent memory (PM) 
> cache by taking advantage of PM's data persistence characteristic, i.e., 
> recovering the cache status when DataNode restarts, thus, cache warm up time 
> can be saved for user.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14745) Backport HDFS persistent memory read cache support to branch-3.1

2019-08-23 Thread Rakesh R (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914096#comment-16914096
 ] 

Rakesh R commented on HDFS-14745:
-

[~PhiloHe] thanks for taking this ahead.

Could you rename the patch including the branch name so that QA will be happy 
to apply it and run it.

For example, "{{HDFS-14745-branch-3.1.001.patch}}".  This is a good feature to 
be included into {{branch-3.x}}, including 3.0, 3.1, 3.2 branches and another 
good thing is less code conflicts. Will take it up one by one.

> Backport HDFS persistent memory read cache support to branch-3.1
> 
>
> Key: HDFS-14745
> URL: https://issues.apache.org/jira/browse/HDFS-14745
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: cache, datanode
> Fix For: 3.3.0
>
> Attachments: HDFS-14745.000.patch, HDFS-14745.001.patch, 
> HDFS-14745.002.patch
>
>
> We are proposing to backport the patches for HDFS-13762, HDFS persistent 
> memory read cache support, to branch-3.1.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14740) HDFS read cache persistence support

2019-08-16 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reassigned HDFS-14740:
---

Assignee: Rui Mo  (was: Feilong He)

> HDFS read cache persistence support
> ---
>
> Key: HDFS-14740
> URL: https://issues.apache.org/jira/browse/HDFS-14740
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Feilong He
>Assignee: Rui Mo
>Priority: Major
> Attachments: HDFS-14740.000.patch
>
>
> In HDFS-13762, persistent memory is enabled in HDFS centralized cache 
> management. Even though persistent memory can persist cache data, for 
> simplifying the implementation, the previous cache data will be cleaned up 
> during DataNode restarts. We propose to improve HDFS persistent memory (PM) 
> cache by taking advantage of PM's data persistence characteristic, i.e., 
> recovering the cache status when DataNode restarts, thus, cache warm up time 
> can be saved for user.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity

2019-08-09 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14700:

Issue Type: Sub-task  (was: Bug)
Parent: HDFS-13762

> Clean up pmem cache before setting pmem cache capacity
> --
>
> Key: HDFS-14700
> URL: https://issues.apache.org/jira/browse/HDFS-14700
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch
>
>
> Cleaning up pmem cache left before, if any, should be prior to setting pmem 
> cache capacity. Because usable space size is used to set pmem cache capacity.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity

2019-08-09 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14700:

   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> Clean up pmem cache before setting pmem cache capacity
> --
>
> Key: HDFS-14700
> URL: https://issues.apache.org/jira/browse/HDFS-14700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch
>
>
> Cleaning up pmem cache left before, if any, should be prior to setting pmem 
> cache capacity. Because usable space size is used to set pmem cache capacity.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity

2019-08-09 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903708#comment-16903708
 ] 

Rakesh R commented on HDFS-14700:
-

Committed the changes to trunk.

> Clean up pmem cache before setting pmem cache capacity
> --
>
> Key: HDFS-14700
> URL: https://issues.apache.org/jira/browse/HDFS-14700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch
>
>
> Cleaning up pmem cache left before, if any, should be prior to setting pmem 
> cache capacity. Because usable space size is used to set pmem cache capacity.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity

2019-08-09 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903693#comment-16903693
 ] 

Rakesh R commented on HDFS-14700:
-

Thanks [~PhiloHe] for reporting this issue. +1 patch looks good to me. I will 
commit shortly.

> Clean up pmem cache before setting pmem cache capacity
> --
>
> Key: HDFS-14700
> URL: https://issues.apache.org/jira/browse/HDFS-14700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch
>
>
> Cleaning up pmem cache left before, if any, should be prior to setting pmem 
> cache capacity. Because usable space size is used to set pmem cache capacity.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14700) Clean up pmem cache before setting pmem cache capacity

2019-08-09 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14700:

Summary: Clean up pmem cache before setting pmem cache capacity  (was: 
Clean up pmem cache left before prior to setting pmem cache capacity)

> Clean up pmem cache before setting pmem cache capacity
> --
>
> Key: HDFS-14700
> URL: https://issues.apache.org/jira/browse/HDFS-14700
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
> Attachments: HDFS-14700.000.patch, HDFS-14700.001.patch
>
>
> Cleaning up pmem cache left before, if any, should be prior to setting pmem 
> cache capacity. Because usable space size is used to set pmem cache capacity.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives

2019-07-28 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13762:

Release Note: Non-volatile storage class memory (SCM, also known as 
persistent memory) is supported in HDFS cache. To enable SCM cache, user just 
needs to configure SCM volume for property “dfs.datanode.cache.pmem.dirs” in 
hdfs-site.xml. And all HDFS cache directives keep unchanged. There are two 
implementations for HDFS SCM Cache, one is pure java code implementation and 
the other is native PMDK based implementation. The latter implementation can 
bring user better performance gain in cache write and cache read. To enable 
PMDK based implementation, user should install PMDK library by referring to the 
official site http://pmem.io/. Then, build Hadoop with PMDK support by 
referring to "PMDK library build options" section in `BUILDING.txt` in the 
source code. If multiple SCM volumes are configured, a round-robin policy is 
used to select an available volume for caching a block. Consistent with DRAM 
cache, SCM cache also has no cache eviction mechanism. When DataNode receives a 
data read request from a client, if the corresponding block is cached into SCM, 
DataNode will instantiate an InputStream with the block location path on SCM 
(pure java implementation) or cache address on SCM (PMDK based implementation). 
Once the InputStream is created, DataNode will send the cached data to the 
client. Please refer "Centralized Cache Management" guide for more details.   
(was: Non-volatile storage class memory (SCM, also known as persistent memory) 
is supported in HDFS cache. To enable SCM cache, user just needs to configure 
SCM volume for property “dfs.datanode.cache.pmem.dirs”. And all HDFS cache 
directives keep unchanged. There are two implementations for HDFS SCM Cache, 
one is pure java code implementation and the other is native PMDK based 
implementation. The latter implementation can bring user better performance 
gain in cache write and cache read. To enable PMDK based implementation, user 
should install PMDK library by referring to the official site http://pmem.io/. 
Then, build Hadoop with PMDK support by referring to "PMDK library build 
options" section in `BUILDING.txt` in the source code. If multiple SCM volumes 
are configured, a round-robin policy is used to select an available volume for 
caching a block. Consistent with DRAM cache, SCM cache also has no cache 
eviction mechanism. When DataNode receives a data read request from a client, 
if the corresponding block is cached into SCM, DataNode will instantiate an 
InputStream with the block location path on SCM (pure java implementation) or 
cache address on SCM (PMDK based implementation). Once the InputStream is 
created, DataNode will send the cache data to the client.)

> Support non-volatile storage class memory(SCM) in HDFS cache directives
> ---
>
> Key: HDFS-13762
> URL: https://issues.apache.org/jira/browse/HDFS-13762
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, datanode
>Reporter: Sammi Chen
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, 
> HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, 
> HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, 
> HDFS-13762.008.patch, SCMCacheDesign-2018-11-08.pdf, 
> SCMCacheDesign-2019-07-12.pdf, SCMCacheDesign-2019-07-16.pdf, 
> SCMCacheDesign-2019-3-26.pdf, SCMCacheTestPlan-2019-3-27.pdf, 
> SCMCacheTestPlan.pdf, SCM_Cache_Perf_Results-v1.pdf
>
>
> No-volatile storage class memory is a type of memory that can keep the data 
> content after power failure or between the power cycle. Non-volatile storage 
> class memory device usually has near access speed as memory DIMM while has 
> lower cost than memory.  So today It is usually used as a supplement to 
> memory to hold long tern persistent data, such as data in cache. 
> Currently in HDFS, we have OS page cache backed read only cache and RAMDISK 
> based lazy write cache.  Non-volatile memory suits for both these functions. 
> This Jira aims to enable storage class memory first in read cache. Although 
> storage class memory has non-volatile characteristics, to keep the same 
> behavior as current read only cache, we don't use its persistent 
> characteristics currently.  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14357) Update documentation for HDFS cache on SCM support

2019-07-15 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14357:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Update documentation for HDFS cache on SCM support
> --
>
> Key: HDFS-14357
> URL: https://issues.apache.org/jira/browse/HDFS-14357
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, 
> HDFS-14357.002.patch, HDFS-14357.003.patch, HDFS-14357.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14357) Update documentation for HDFS cache on SCM support

2019-07-15 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14357:

Summary: Update documentation for HDFS cache on SCM support  (was: Update 
the relevant docs for HDFS cache on SCM support)

> Update documentation for HDFS cache on SCM support
> --
>
> Key: HDFS-14357
> URL: https://issues.apache.org/jira/browse/HDFS-14357
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, 
> HDFS-14357.002.patch, HDFS-14357.003.patch, HDFS-14357.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14357) Update the relevant docs for HDFS cache on SCM support

2019-07-15 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884905#comment-16884905
 ] 

Rakesh R commented on HDFS-14357:
-

Thanks [~PhiloHe] for the contribution.


 +1 patch looks good to me, I will commit this shortly.

> Update the relevant docs for HDFS cache on SCM support
> --
>
> Key: HDFS-14357
> URL: https://issues.apache.org/jira/browse/HDFS-14357
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, 
> HDFS-14357.002.patch, HDFS-14357.003.patch, HDFS-14357.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14458) Report pmem stats to namenode

2019-07-15 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884897#comment-16884897
 ] 

Rakesh R commented on HDFS-14458:
-

Attached the patch by removing the {{WARN}} log message and the same is 
committed to trunk branch.

> Report pmem stats to namenode
> -
>
> Key: HDFS-14458
> URL: https://issues.apache.org/jira/browse/HDFS-14458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14458-committedpatch.patch, HDFS-14458.000.patch, 
> HDFS-14458.001.patch, HDFS-14458.002.patch, HDFS-14458.003.patch, 
> HDFS-14458.004.patch, HDFS-14458.005.patch
>
>
> Currently, two important stats should be reported to NameNode: cache used and 
> cache capacity. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14458) Report pmem stats to namenode

2019-07-15 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14458:

Attachment: HDFS-14458-committedpatch.patch

> Report pmem stats to namenode
> -
>
> Key: HDFS-14458
> URL: https://issues.apache.org/jira/browse/HDFS-14458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14458-committedpatch.patch, HDFS-14458.000.patch, 
> HDFS-14458.001.patch, HDFS-14458.002.patch, HDFS-14458.003.patch, 
> HDFS-14458.004.patch, HDFS-14458.005.patch
>
>
> Currently, two important stats should be reported to NameNode: cache used and 
> cache capacity. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14458) Report pmem stats to namenode

2019-07-15 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14458:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Report pmem stats to namenode
> -
>
> Key: HDFS-14458
> URL: https://issues.apache.org/jira/browse/HDFS-14458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, 
> HDFS-14458.002.patch, HDFS-14458.003.patch, HDFS-14458.004.patch, 
> HDFS-14458.005.patch
>
>
> Currently, two important stats should be reported to NameNode: cache used and 
> cache capacity. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14458) Report pmem stats to namenode

2019-07-15 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884893#comment-16884893
 ] 

Rakesh R commented on HDFS-14458:
-

Thanks [~PhiloHe] for the patch. I am thinking to not include the WARN message, 
it may pollute with lots of log messages and am removing it now. Anyway, there 
is a log message to clearly conveying the disable part.

Apart from that your patch looks good to me, I will commit shortly.

> Report pmem stats to namenode
> -
>
> Key: HDFS-14458
> URL: https://issues.apache.org/jira/browse/HDFS-14458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, 
> HDFS-14458.002.patch, HDFS-14458.003.patch, HDFS-14458.004.patch, 
> HDFS-14458.005.patch
>
>
> Currently, two important stats should be reported to NameNode: cache used and 
> cache capacity. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14357) Update the relevant docs for HDFS cache on SCM support

2019-07-12 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883688#comment-16883688
 ] 

Rakesh R commented on HDFS-14357:
-

Thanks [~PhiloHe] for the patch.

Please take care below comment:
bq. One depends on PMDK libs and the other doesn't. PMDK can bring user 
performance gain for cache write and cache read
Please rephrase it like,
The default is based on pure Java implementation and the other is native 
implementation which leverages PMDK library to improve the performance of cache 
write and cache read. 

To enable and use PMDK, use following steps:
1. Build PMDK library. Please refer to the official site "" for detail 
information.
2. Build Hadoop with PMDK support. Please refer to `BUILDING.txt` to build 
hadoop with PMDK support.
3. Copy PMDK library. Make sure PMDK is available on HDFS DataNodes.

To verify that PMDK is correctly detected by Hadoop, run the hadoop 
{{checknative}} command.

bq. For multiply volumes
Typo: {{For multiply volumes}} --> to --> {{For multiple volumes}}

> Update the relevant docs for HDFS cache on SCM support
> --
>
> Key: HDFS-14357
> URL: https://issues.apache.org/jira/browse/HDFS-14357
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14357.000.patch, HDFS-14357.001.patch, 
> HDFS-14357.002.patch, HDFS-14357.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14458) Report pmem stats to namenode

2019-07-12 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883663#comment-16883663
 ] 

Rakesh R commented on HDFS-14458:
-

Thanks [~PhiloHe] for the updates. Please take care below comments:
# Remove unused method
{code}
  /**
   * Check if pmem cache is enabled.
   */
  private boolean isPmemCacheEnabled() {
return !cacheLoader.isTransientCache();
  }
{code}
# Do we really need this exception handling? This can log repeatedly.
{code}
  if (cacheCapacity == 0L) {
throw new IOException("DRAM cache may be disabled. The cache capacity 
is 0.");
  }
{code}
# We need to unify {{CacheStats}} of 'DRAM' and 'PMem' utilization. I'm OK to 
do this separately along with LazyWriter PMem support.

> Report pmem stats to namenode
> -
>
> Key: HDFS-14458
> URL: https://issues.apache.org/jira/browse/HDFS-14458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, 
> HDFS-14458.002.patch, HDFS-14458.003.patch, HDFS-14458.004.patch
>
>
> Currently, two important stats should be reported to NameNode: cache used and 
> cache capacity. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14458) Report pmem stats to namenode

2019-07-10 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882301#comment-16882301
 ] 

Rakesh R commented on HDFS-14458:
-

Thanks [~PhiloHe] for taking this ahead. Added few comments:
# By default {{dfs.datanode.max.locked.memory}} is zero. Do you wants to 
disable in-memory caching if PMem-cache is enabled? If yes, please add a log 
message to convey the same. Could you try adding a unit test to automate this 
behavior.
{code}
this.memCacheStats = new MemoryCacheStats(0L);
{code}
# I'd prefer to avoid {{if (isPmemCacheEnabled())}} checks inside 
FsDatasetCache. How about {{cacheLoader#initialize(this)}} returns {{memStats}} 
?
{code}
MemoryCacheStats stats = cacheLoader.initialize(this);
{code}
# Appreciate if you could add unit test the results of PMem stats. Thanks!


> Report pmem stats to namenode
> -
>
> Key: HDFS-14458
> URL: https://issues.apache.org/jira/browse/HDFS-14458
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14458.000.patch, HDFS-14458.001.patch, 
> HDFS-14458.002.patch
>
>
> Currently, two important stats should be reported to NameNode: cache used and 
> cache capacity. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1688) Deadlock in ratis client

2019-06-14 Thread Rakesh R (JIRA)
Rakesh R created HDDS-1688:
--

 Summary: Deadlock in ratis client
 Key: HDDS-1688
 URL: https://issues.apache.org/jira/browse/HDDS-1688
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rakesh R
 Attachments: Freon_baseline_100Threads_64MB_Keysize_8Keys_10buckets.bin

Ran Freon benchmark in a three node cluster with 100 writer threads. After some 
time the client got hanged due to deadlock issue.

+Freon with the args:-+
--numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 
--numOfThreads=100

3 BLOCKED threads. Attached whole threaddump.

{code}
Found one Java-level deadlock:
=
"grpc-default-executor-6":
  waiting for ownable synchronizer 0x00021546bd00, (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync),
  which is held by "ForkJoinPool.commonPool-worker-7"
"ForkJoinPool.commonPool-worker-7":
  waiting to lock monitor 0x7f48fc99c448 (object 0x00021546be30, a 
org.apache.ratis.util.SlidingWindow$Client),
  which is held by "grpc-default-executor-6"
{code}

{code}
ForkJoinPool.commonPool-worker-7
priority:5 - threadId:0x7f48d834b000 - nativeId:0x9ffb - nativeId 
(decimal):40955 - state:BLOCKED
stackTrace:
java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.ratis.util.SlidingWindow$Client.resetFirstSeqNum(SlidingWindow.java:348)
- waiting to lock <0x00021546be30> (a 
org.apache.ratis.util.SlidingWindow$Client)
at 
org.apache.ratis.client.impl.OrderedAsync.resetSlidingWindow(OrderedAsync.java:122)
at 
org.apache.ratis.client.impl.OrderedAsync$$Lambda$943/1670264164.accept(Unknown 
Source)
at 
org.apache.ratis.client.impl.RaftClientImpl.lambda$handleIOException$6(RaftClientImpl.java:352)
at 
org.apache.ratis.client.impl.RaftClientImpl$$Lambda$944/769363367.accept(Unknown
 Source)
at java.util.Optional.ifPresent(Optional.java:159)
at 
org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:352)
at 
org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:235)
at 
org.apache.ratis.client.impl.OrderedAsync$$Lambda$776/1213731951.apply(Unknown 
Source)
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at 
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:324)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.close(GrpcClientProtocolClient.java:313)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$400(GrpcClientProtocolClient.java:245)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.lambda$close$1(GrpcClientProtocolClient.java:131)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$$Lambda$950/1948156329.accept(Unknown
 Source)
at java.util.Optional.ifPresent(Optional.java:159)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:131)
at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:73)
at 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy$$Lambda$948/427065222.run(Unknown
 Source)
at 
org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
at org.apache.ratis.util.LifeCycle$$Lambda$949/1311526821.get(Unknown Source)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
at org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
at org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
- locked <0x0003e793ef48> (a 
org.apache.ratis.util.PeerProxyMap$PeerAndProxy)
at org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:126)
- locked <0x000215453400> (a java.lang.Object)
at org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:135)
at 
org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
at 
org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:375)
at 
org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:341)
at 
org.apache.ratis.client.impl.UnorderedAsync.lambda$sendRequestWithRetry$4(UnorderedAsync.java:108)
at 
org.apache.ratis.client.impl.UnorderedAsync$$Lambda$976/655038759.accept(Unknown
 Source)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 

[jira] [Created] (HDDS-1687) Datanode process shutdown due to OOME

2019-06-14 Thread Rakesh R (JIRA)
Rakesh R created HDDS-1687:
--

 Summary: Datanode process shutdown due to OOME
 Key: HDDS-1687
 URL: https://issues.apache.org/jira/browse/HDDS-1687
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Rakesh R
 Attachments: baseline test - datanode error logs.0.5.0.rar

Ran Freon benchmark in a three node cluster and with more parallel writer 
threads, datanode daemon hits OOME and got shutdown. Used HDD as storage type 
in worker nodes.

+Freon with the args:-+
--numOfBuckets=10 --numOfKeys=8 --keySize=67108864 --numOfVolumes=100 
--numOfThreads=100


*DN-2* : Process got killed during the test, due to OOME
{code}
2019-06-13 00:48:11,976 ERROR 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: Terminating 
with exit status 1: 
a0cb8914-b51c-41b1-b5d2-59313cf38c0b-SegmentedRaftLogWorker:Storage Directory 
/data/datab/ozone/metadir/ratis/cbf29739-cbd1-4b00-8a21-2db750004dc7 failed.
java.lang.OutOfMemoryError: Direct buffer memory
   at java.nio.Bits.reserveMemory(Bits.java:694)
   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
   at 
org.apache.ratis.server.raftlog.segmented.BufferedWriteChannel.(BufferedWriteChannel.java:44)
   at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogOutputStream.(SegmentedRaftLogOutputStream.java:70)
   at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker$StartLogSegment.execute(SegmentedRaftLogWorker.java:481)
   at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker.run(SegmentedRaftLogWorker.java:234)
   at java.lang.Thread.run(Thread.java:748)
{code}

*DN3* : Process got killed during the test, due to OOME. I could see lots of 
NPE at the datanode logs.
{code}
2019-06-13 00:44:44,581 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9: follower 
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, 
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: 
follower responses installSnapshot Completed
2019-06-13 00:44:44,582 INFO org.apache.ratis.grpc.server.GrpcLogAppender: 
83232f1f-4469-4a4d-b369-c131c8432ae9: follower 
07ace812-3883-47d3-ac95-3d55de5fab5c:10.243.61.192:9858's next index is 0, 
log's start index is 10062, need to notify follower to install snapshot
2019-06-13 00:44:44,587 ERROR org.apache.ratis.server.impl.LogAppender: 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon@554415fe unexpected 
exception
java.lang.NullPointerException: 
83232f1f-4469-4a4d-b369-c131c8432ae9->07ace812-3883-47d3-ac95-3d55de5fab5c: 
Previous TermIndex not found for firstIndex = 10062
   at java.util.Objects.requireNonNull(Objects.java:290)
   at 
org.apache.ratis.server.impl.LogAppender.assertProtos(LogAppender.java:234)
   at 
org.apache.ratis.server.impl.LogAppender.createRequest(LogAppender.java:221)
   at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:169)
   at 
org.apache.ratis.grpc.server.GrpcLogAppender.runAppenderImpl(GrpcLogAppender.java:113)
   at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:80)
   at java.lang.Thread.run(Thread.java:748)

OOME log messages present in the *.out file.

Exception in thread 
"org.apache.ratis.server.impl.LogAppender$AppenderDaemon$$Lambda$267/386355867@1d9c10b3"
 java.lang.OutOfMemoryError: unable to create new native thread
   at java.lang.Thread.start0(Native Method)
   at java.lang.Thread.start(Thread.java:717)
   at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.start(LogAppender.java:68)
   at 
org.apache.ratis.server.impl.LogAppender.startAppender(LogAppender.java:153)
   at java.util.ArrayList.forEach(ArrayList.java:1257)
   at 
org.apache.ratis.server.impl.LeaderState.addAndStartSenders(LeaderState.java:372)
   at 
org.apache.ratis.server.impl.LeaderState.restartSender(LeaderState.java:394)
   at 
org.apache.ratis.server.impl.LogAppender$AppenderDaemon.run(LogAppender.java:97)
   at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1594) NullPointerException at the ratis client while running Freon benchmark

2019-05-27 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDDS-1594:
---
Attachment: NPE-logs.tar.gz

> NullPointerException at the ratis client while running Freon benchmark
> --
>
> Key: HDDS-1594
> URL: https://issues.apache.org/jira/browse/HDDS-1594
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Rakesh R
>Priority: Minor
> Attachments: NPE-logs.tar.gz
>
>
> Hits NPE during Freon benchmark test run. Below is the exception logged at 
> the client side output log message. 
> {code}
> SEVERE: Exception while executing runnable 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536
> java.lang.NullPointerException
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245)
> at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269)
> at 
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
> at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
> at 
> org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
> at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1594) NullPointerException at the ratis client while running Freon benchmark

2019-05-27 Thread Rakesh R (JIRA)
Rakesh R created HDDS-1594:
--

 Summary: NullPointerException at the ratis client while running 
Freon benchmark
 Key: HDDS-1594
 URL: https://issues.apache.org/jira/browse/HDDS-1594
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Rakesh R


Hits NPE during Freon benchmark test run. Below is the exception logged at the 
client side output log message. 

{code}
SEVERE: Exception while executing runnable 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed@6c585536
java.lang.NullPointerException
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.completeReplyExceptionally(GrpcClientProtocolClient.java:320)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$000(GrpcClientProtocolClient.java:245)
at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:269)
at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache

2019-05-26 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14402:

Labels: SCM  (was: )

> Use FileChannel.transferTo() method for transferring block to SCM cache
> ---
>
> Key: HDFS-14402
> URL: https://issues.apache.org/jira/browse/HDFS-14402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: SCM
> Fix For: 3.3.0
>
> Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, 
> HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, 
> Without-Cache-Improvement-Patch.png
>
>
> We will consider to use transferTo API to improve SCM's cach performace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache

2019-05-26 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14402:

   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

I have committed latest patch to trunk.

Thanks [~PhiloHe] for the contribution!

> Use FileChannel.transferTo() method for transferring block to SCM cache
> ---
>
> Key: HDFS-14402
> URL: https://issues.apache.org/jira/browse/HDFS-14402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, 
> HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, 
> Without-Cache-Improvement-Patch.png
>
>
> We will consider to use transferTo API to improve SCM's cach performace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache

2019-05-26 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848372#comment-16848372
 ] 

Rakesh R commented on HDFS-14402:
-

Thank you [~PhiloHe] for the patch. The performance result looks promising and 
shows time reduction. Probably, can plan test comparison between HDD and NVMe 
devices, that would be interesting and can attach to umbrella Jira as well.

+1 LGTM, I will commit the latest patch shortly.

> Use FileChannel.transferTo() method for transferring block to SCM cache
> ---
>
> Key: HDFS-14402
> URL: https://issues.apache.org/jira/browse/HDFS-14402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, 
> HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, 
> Without-Cache-Improvement-Patch.png
>
>
> We will consider to use transferTo API to improve SCM's cach performace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache

2019-05-24 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847482#comment-16847482
 ] 

Rakesh R commented on HDFS-14402:
-

bq. Upon making checksum configurable, I am thinking maybe to user, the cache 
read performance is concerned mostly. The checksum is executed once when 
caching data to DRAM/Pmem. It may be tolerable to user with checksum operation 
for data verification. I think more discussions are required. I will open a 
separate Jira common to DRAM and Pmem cache.
Yes, it makes sense to me to move the discussion to a separate jira. One use 
case I can foresee that, if some user has concern about the sanity of data 
while {{read-from-cache}} block then they will do checksum computation again at 
read call. This is again depends on the NVMe device consistency level, I think. 
In that case the checksum we did at the beginning can be skipped and not 
worried about the sanity during {{write-to-cache}}.

> Use FileChannel.transferTo() method for transferring block to SCM cache
> ---
>
> Key: HDFS-14402
> URL: https://issues.apache.org/jira/browse/HDFS-14402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14402.000.patch, HDFS-14402.001.patch, 
> HDFS-14402.002.patch, With-Cache-Improvement-Patch.png, 
> Without-Cache-Improvement-Patch.png
>
>
> We will consider to use transferTo API to improve SCM's cach performace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14402) Use FileChannel.transferTo() method for transferring block to SCM cache

2019-05-15 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14402:

Summary: Use FileChannel.transferTo() method for transferring block to SCM 
cache  (was: Improve the implementation for HDFS cache on SCM)

> Use FileChannel.transferTo() method for transferring block to SCM cache
> ---
>
> Key: HDFS-14402
> URL: https://issues.apache.org/jira/browse/HDFS-14402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14402.000.patch, HFDS-14402.001.patch
>
>
> We will consider to use transferTo API to improve SCM's cach performace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14402) Improve the implementation for HDFS cache on SCM

2019-05-13 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839033#comment-16839033
 ] 

Rakesh R commented on HDFS-14402:
-

Thanks [~PhiloHe] for taking this jira ahead. Please go through below review 
comments on the patch:
# Checksum verification uses {{blockChannel}} and reads content at the replica 
device throughput. Again, reading block content make the replica device busy 
and will affect other client read/write operation, assume replica device is a 
HDD. Since the data is written to NVMe, you can read it back from NVMe. Also, 
this will act as a verification checkpoint to ensure there is no data loss 
after writing the block to NVMe device, right?
{code}
verifyChecksum(length, metaIn, blockChannel, blockFileName);
{code}
# Can we make checksum compute on/off based on some configuration and this can 
improve performance and not make the device busy. We can introduce this tuning 
on/off parameter applicable to all MappableBlockLoaders.
# Please change {{protected int fillBuffer}} to {{private}} visibility.

> Improve the implementation for HDFS cache on SCM
> 
>
> Key: HDFS-14402
> URL: https://issues.apache.org/jira/browse/HDFS-14402
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14402.000.patch, HFDS-14402.001.patch
>
>
> We will consider to use transferTo API to improve SCM's cach performace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-05-08 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14401:

   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

I have committed the patch to trunk.

Thanks [~umamaheswararao], [~anoop.hbase] for the reviews!

Thanks [~PhiloHe] for the contribution!

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, 
> HDFS-14401.008.patch, HDFS-14401.009.patch, HDFS-14401.010.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-05-07 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835285#comment-16835285
 ] 

Rakesh R commented on HDFS-14401:
-

Thanks [~PhiloHe] for the continuous efforts on this. +1, latest patch looks 
good to me.

If there are no comments from others, I will move ahead and commit this today. 
Thanks!

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, 
> HDFS-14401.008.patch, HDFS-14401.009.patch, HDFS-14401.010.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-05-07 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834579#comment-16834579
 ] 

Rakesh R edited comment on HDFS-14401 at 5/7/19 9:44 AM:
-

Apart from the following comments, latest patch looks good to me. I will commit 
the patch after fixing these comments, if there is no more comments from others.
# How about adding a function to the interface and makes {{FsDatasetCache}} 
code simple.
{code:java}
MappableBlockLoader.java
  /**
   * Cleaning up the cache, can be used during shutdown.
   */
  void cleanup() {
// do nothing
  }

PmemMappableBlockLoader.java
  @Override
  void cleanup() {
LOG.info("Clean up cache on persistent memory during shutdown.");
pmemVolumeManager.cleanup();
  }


  /**
   * Clean up cache.
   */
  void shutdown() {
cacheLoader.cleanup();
  }
{code}
# Why can't we simply do {{return new File(rawPmemDir, 
CACHE_DIR).getAbsolutePath();}} instead of {{rawPmemDir.endsWith("/") ? 
rawPmemDir + CACHE_DIR}}


was (Author: rakeshr):
Apart from the following comments, latest patch looks good to me. I will commit 
the patch after fixing these comments, if there is no more comments from others.
 # How about adding a function to the interface and makes {{FsDatasetCache}} 
code simple.
{code:java}
MappableBlockLoader.java
  /**
   * Cleaning up the cache, can be used during shutdown.
   */
  void cleanup() {
// do nothing
  }

PmemMappableBlockLoader.java
  @Override
  void cleanup() {
LOG.info("Clean up cache on persistent memory during shutdown.");
pmemVolumeManager.cleanup();
  }


  /**
   * Clean up cache.
   */
  void shutdown() {
cacheLoader.cleanup();
  }
{code}

 # Why can't we simply do {{return new File(rawPmemDir, 
CACHE_DIR).getAbsolutePath();}} instead of {{rawPmemDir.endsWith("/") ? 
rawPmemDir + CACHE_DIR}}

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, 
> HDFS-14401.008.patch, HDFS-14401.009.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-05-07 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834579#comment-16834579
 ] 

Rakesh R commented on HDFS-14401:
-

Apart from the following comments, latest patch looks good to me. I will commit 
the patch after fixing these comments, if there is no more comments from others.
 # How about adding a function to the interface and makes {{FsDatasetCache}} 
code simple.
{code:java}
MappableBlockLoader.java
  /**
   * Cleaning up the cache, can be used during shutdown.
   */
  void cleanup() {
// do nothing
  }

PmemMappableBlockLoader.java
  @Override
  void cleanup() {
LOG.info("Clean up cache on persistent memory during shutdown.");
pmemVolumeManager.cleanup();
  }


  /**
   * Clean up cache.
   */
  void shutdown() {
cacheLoader.cleanup();
  }
{code}

 # Why can't we simply do {{return new File(rawPmemDir, 
CACHE_DIR).getAbsolutePath();}} instead of {{rawPmemDir.endsWith("/") ? 
rawPmemDir + CACHE_DIR}}

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch, HDFS-14401.006.patch, HDFS-14401.007.patch, 
> HDFS-14401.008.patch, HDFS-14401.009.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-30 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830376#comment-16830376
 ] 

Rakesh R edited comment on HDFS-14401 at 4/30/19 3:09 PM:
--

Overall the patch looks good and I think its nearing completion. Could you 
please take care below comments:
# Rename PmemVolumeManager variable '{{i}}' to '{{nextIndex}}'.
# How about resetting {{nextIndex}} to avoid growing to infinity, probably can 
refer below idea or you can explicitly reset {{nextIndex=0}}, {{if (nextIndex 
== count)}}.
{code:java}
private byte nextIndex = 0;
..
..
while (k++ != count) {
  nextIndex = (byte) (nextIndex % count);
  byte index = nextIndex;
  nextIndex++;
  long availableBytes = usedBytesCounts.get(index).getAvailableBytes();
  if (availableBytes >= bytesCount) {
return index;
  }
  if (availableBytes > maxAvailableSpace) {
maxAvailableSpace = availableBytes;
  }
}
{code}
# Instead of {{memCacheStats.getCacheUsed()}}, it should be 
{{cacheLoader.getCacheUsed()}}, right?
{code:java}
  LOG.debug("Caching of {} was aborted.  We are now caching only {} "
  + "bytes in total.", key, cacheLoader.getCacheUsed());
{code}
# Please double check the chances of any scenario where it adds 
{{blockKeyToVolume.put(key, index);}} entry and then 
{{usedBytesCounts.get(index).reserve(bytesCount);}} return -1?


was (Author: rakeshr):
Overall the patch looks good and I think its nearing completion. Could you 
please take care below comments:
 # Rename PmemVolumeManager variable '{{i}}' to '{{nextIndex}}'.
 # How about resetting {{nextIndex}} to avoid growing to infinity, probably can 
refer below idea or you can explicitly reset {{nextIndex=0}}, {{if (nextIndex 
== count)}}.
{code:java}
private byte nextIndex = 0;
..
..
while (k++ != count) {
  nextIndex = (byte) (nextIndex % count);
  byte index = nextIndex;
  nextIndex++;
  long availableBytes = usedBytesCounts.get(index).getAvailableBytes();
  if (availableBytes >= bytesCount) {
return index;
  }
  if (availableBytes > maxAvailableSpace) {
maxAvailableSpace = availableBytes;
  }
}
{code}

 # Instead of {{memCacheStats.getCacheUsed()}}, it should be 
{{cacheLoader.getCacheUsed()}}, right?
{code:java}
  LOG.debug("Caching of {} was aborted.  We are now caching only {} "
  + "bytes in total.", key, cacheLoader.getCacheUsed());
{code}

 # Please double check the chances of any scenario where it adds 
{{blockKeyToVolume.put(key, index);}} entry and then 
{{usedBytesCounts.get(index).reserve(bytesCount);}} return -1?

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch, HDFS-14401.006.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-30 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830376#comment-16830376
 ] 

Rakesh R commented on HDFS-14401:
-

Overall the patch looks good and I think its nearing completion. Could you 
please take care below comments:
 # Rename PmemVolumeManager variable '{{i}}' to '{{nextIndex}}'.
 # How about resetting {{nextIndex}} to avoid growing to infinity, probably can 
refer below idea or you can explicitly reset {{nextIndex=0}}, {{if (nextIndex 
== count)}}.
{code:java}
private byte nextIndex = 0;
..
..
while (k++ != count) {
  nextIndex = (byte) (nextIndex % count);
  byte index = nextIndex;
  nextIndex++;
  long availableBytes = usedBytesCounts.get(index).getAvailableBytes();
  if (availableBytes >= bytesCount) {
return index;
  }
  if (availableBytes > maxAvailableSpace) {
maxAvailableSpace = availableBytes;
  }
}
{code}

 # Instead of {{memCacheStats.getCacheUsed()}}, it should be 
{{cacheLoader.getCacheUsed()}}, right?
{code:java}
  LOG.debug("Caching of {} was aborted.  We are now caching only {} "
  + "bytes in total.", key, cacheLoader.getCacheUsed());
{code}

 # Please double check the chances of any scenario where it adds 
{{blockKeyToVolume.put(key, index);}} entry and then 
{{usedBytesCounts.get(index).reserve(bytesCount);}} return -1?

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch, HDFS-14401.006.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-25 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826642#comment-16826642
 ] 

Rakesh R edited comment on HDFS-14401 at 4/26/19 5:12 AM:
--

Thanks [~PhiloHe] for the good progress. Adding few comments,
# Move log message to respective constructor, that will make the 
FsDatasetCache.java more cleaner.
{code:java}
PmemMappableBlockLoader(){
LOG.info("Initializing cache loader: PmemMappableBlockLoader");
}

MemoryMappableBlockLoader(){
LOG.info("Initializing cache loader: MemoryMappableBlockLoader");
}
{code}
# How about using a {{MappableBlockLoaderFactory}} and move 
{{#createCacheLoader(DNConf)}} function into that.
{code:java}
   MappableBlockLoader loader = 
MappableBlockLoaderFactory.getInstance().createCacheLoader(this.getDnConf());
{code}
# Typo - '{{due to unsuccessfully mapping'}} -->to-> '{{due to unsuccessful 
mapping'}}.
# Can we make synchronized functions {{long release}} and {{public String 
getCachePath}}
# {{maxBytes = pmemDir.getTotalSpace();}}, IMHO, to use 
[File#getUsableSpace()|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getUsableSpace()]
 function.
# Remove unused var in PmemVolumeManager.java - {{// private final 
UsedBytesCount usedBytesCount;}}
# Its good to use {} instead of string concatenation in log messages. Please 
take care all such occurrences in newly writing code.
{code:java}
   LOG.info("Added persistent memory - " + volumes[n] +
  " with size=" + maxBytes);

   to

LOG.info("Added persistent memory - {} with size={}",
volumes[n], maxBytes);
{code}


was (Author: rakeshr):
Thanks [~PhiloHe] for the good progress. Adding few comments,
 # Move log message to respective constructor, that will make the 
FsDatasetCache.java more cleaner.
{code:java}
PmemMappableBlockLoader(){
LOG.info("Initializing cache loader: PmemMappableBlockLoader");
}

MemoryMappableBlockLoader(){
LOG.info("Initializing cache loader: MemoryMappableBlockLoader");
}
{code}

 # How about using a {{MappableBlockLoaderFactory}} and move 
{{#createCacheLoader(DNConf)}} function into that.
{code:java}
   MappableBlockLoader loader = 
MappableBlockLoaderFactory.getInstance().createCacheLoader(this.getDnConf());
{code}

 # Typo - '{{due to unsuccessfully mapping'}} -->to-> '{{due to unsuccessful 
mapping'}}.
 # Can we make synchronized functions {{long release}} and {{public String 
getCachePath}}
 # {{maxBytes = pmemDir.getTotalSpace();}}, IMHO, to use 
[File#getUsableSpace()|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getUsableSpace()]
 function.
 # Remove unused var in PmemVolumeManager.java - {{// private final 
UsedBytesCount usedBytesCount;}}
 # Its good to use {} instead of string concatenation in log messages. Please 
take care all such occurrences in newly writing code.
{code:java}
   LOG.info("Added persistent memory - " + volumes[n] +
  " with size=" + maxBytes);

   to

LOG.info("Added persistent memory - {} with size={}",
volumes[n], maxBytes);
{code}

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-25 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826642#comment-16826642
 ] 

Rakesh R commented on HDFS-14401:
-

Thanks [~PhiloHe] for the good progress. Adding few comments,
 # Move log message to respective constructor, that will make the 
FsDatasetCache.java more cleaner.
{code:java}
PmemMappableBlockLoader(){
LOG.info("Initializing cache loader: PmemMappableBlockLoader");
}

MemoryMappableBlockLoader(){
LOG.info("Initializing cache loader: MemoryMappableBlockLoader");
}
{code}

 # How about using a {{MappableBlockLoaderFactory}} and move 
{{#createCacheLoader(DNConf)}} function into that.
{code:java}
   MappableBlockLoader loader = 
MappableBlockLoaderFactory.getInstance().createCacheLoader(this.getDnConf());
{code}

 # Typo - '{{due to unsuccessfully mapping'}} -->to-> '{{due to unsuccessful 
mapping'}}.
 # Can we make synchronized functions {{long release}} and {{public String 
getCachePath}}
 # {{maxBytes = pmemDir.getTotalSpace();}}, IMHO, to use 
[File#getUsableSpace()|https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getUsableSpace()]
 function.
 # Remove unused var in PmemVolumeManager.java - {{// private final 
UsedBytesCount usedBytesCount;}}
 # Its good to use {} instead of string concatenation in log messages. Please 
take care all such occurrences in newly writing code.
{code:java}
   LOG.info("Added persistent memory - " + volumes[n] +
  " with size=" + maxBytes);

   to

LOG.info("Added persistent memory - {} with size={}",
volumes[n], maxBytes);
{code}

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch, HDFS-14401.001.patch, 
> HDFS-14401.002.patch, HDFS-14401.003.patch, HDFS-14401.004.patch, 
> HDFS-14401.005.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-10 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814292#comment-16814292
 ] 

Rakesh R commented on HDFS-14401:
-

Volume management at the datanode uses java Files APIs and do managing the 
mount paths. Similar to that, this feature also have multiple {{pmem.dirs}} 
supported, to make it simple {{pmem}} can also follow the same pattern. This 
will make the basic pmem configuration easy and user can enable the feature 
with less efforts. Later, if there is a need of special shared deployment (same 
mount path used by HDFS cache and some other apps) then we can provide advance 
configuration per pmem volume management complexity.

Can use java file APIs like,
{code:java}
java.io.File.getTotalSpace();
java.io.File.getFreeSpace();
java.io.File.getUsableSpace();
{code}
[Hadoop code reference: 
DF.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DF.java#L83]

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-28 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804220#comment-16804220
 ] 

Rakesh R commented on HDFS-14355:
-

[~PhiloHe], HDFS-14393 sub-task has been resolved, please rebase your patch 
based on the interface changes.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch, HDFS-14355.006.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation

2019-03-28 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

I have committed to trunk. Thanks [~umamaheswararao] and [~PhiloHe] for the 
reviews!

> Refactor FsDatasetCache for SCM cache implementation
> 
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, 
> HDFS-14393-003.patch
>
>
> This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and 
> PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation

2019-03-28 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804173#comment-16804173
 ] 

Rakesh R commented on HDFS-14393:
-

Thank you [~umamaheswararao] and [~PhiloHe] for the feedback. I have updated 
the Jira title.

I will commit this shortly.

> Refactor FsDatasetCache for SCM cache implementation
> 
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, 
> HDFS-14393-003.patch
>
>
> This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and 
> PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation

2019-03-28 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

Summary: Refactor FsDatasetCache for SCM cache implementation  (was: Move 
stats related methods to MappableBlockLoader)

> Refactor FsDatasetCache for SCM cache implementation
> 
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, 
> HDFS-14393-003.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Refactor FsDatasetCache for SCM cache implementation

2019-03-28 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

Description: This jira sub-task is to make FsDatasetCache more cleaner to 
plugin DRAM and PMem implementations.  (was: This jira sub-task is to move 
stats related methods to specific loader and make FsDatasetCache more cleaner 
to plugin DRAM and PMem implementations.)

> Refactor FsDatasetCache for SCM cache implementation
> 
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, 
> HDFS-14393-003.patch
>
>
> This jira sub-task is to make FsDatasetCache more cleaner to plugin DRAM and 
> PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-28 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803830#comment-16803830
 ] 

Rakesh R commented on HDFS-14393:
-

Thanks [~umamaheswararao] for the reviews.

Thanks [~PhiloHe] for bringing the point - 'both lazy writer and read cache are 
sharing {{MemoryCacheStats}} statistics'. I have uploaded another patch 
addressing the same.

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, 
> HDFS-14393-003.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-28 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

Attachment: HDFS-14393-003.patch

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch, 
> HDFS-14393-003.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-28 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803612#comment-16803612
 ] 

Rakesh R edited comment on HDFS-14355 at 3/28/19 6:12 AM:
--

Adding review comments, please take care.

# How about adding an API to interface 
{{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It 
can return specific flag value to differentiate NVMe/DRAM based cache.
{code:java}
public boolean isPmemCacheEnabled() {
return mappableBlockLoader instanceof PmemMappableBlockLoader;
}
{code}
# I'd like to avoid type casting. It won't work with another Pmem 
implementation, right?
{code:java}
  public String getReplicaCachePath(String bpid, long blockId) {
if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) {
  return null;
}
ExtendedBlockId key = new ExtendedBlockId(blockId, bpid);
String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader)
.getPmemVolumeManager()
.getCachedFilePath(key);
return cachePath;
  }
{code}
# Below type casting can be replaced with HDFS-14393 interface.
{code:java}
  /**
   * Get cache capacity of persistent memory.
   * TODO: advertise this metric to NameNode by FSDatasetMBean
   */
  public long getPmemCacheCapacity() {
if (isPmemCacheEnabled()) {
  return ((PmemMappableBlockLoader)mappableBlockLoader)
  .getPmemVolumeManager().getPmemCacheCapacity();
}
return 0;
  }
  
public long getPmemCacheUsed() {
if (isPmemCacheEnabled()) {
  return ((PmemMappableBlockLoader)mappableBlockLoader)
  .getPmemVolumeManager().getPmemCacheUsed();
}
return 0;
  }
{code}
# {{FsDatasetUtil#deleteMappedFile}} - try-catch is not required, can we do 
like,
{code:java}
  public static void deleteMappedFile(String filePath) throws IOException {


boolean result = Files.deleteIfExists(Paths.get(filePath));
if (!result) {
  throw new IOException("Failed to delete the mapped file: " + filePath);
}
  }
{code}
# Why cant't we avoid {{LocalReplica}} changes and read directly from Util like 
below,
{code:java}
FsDatasetImpl#getBlockInputStreamWithCheckingPmemCache()
.
if (cachePath != null) {
  return FsDatasetUtil.getInputStreamAndSeek(new File(cachePath),
  seekOffset);
}
{code}
# As the class {{PmemVolumeManager}} itself represents {{Pmem}} so its good to 
remove this extra keyword from the methods and entities from this class - 
PmemUsedBytesCount, getPmemCacheUsed, getPmemCacheCapacity etc..
# Please avoid unchecked conversion and we can do like,
{code:java}
PmemVolumeManager.java

  private final Map blockKeyToVolume =
  new ConcurrentHashMap<>();
  
  Map getBlockKeyToVolume() {
return blockKeyToVolume;
  }
{code}
# Add exception message in PmemVolumeManager#verifyIfValidPmemVolume
{code:java}
  if (out == null) {
throw new IOException();
  }
{code}
# Here {{IOException}} clause is not required, please remove it. We can add it 
later, if needed.
{code:java}
MappableBlock.java
void afterCache() throws IOException;

FsDatasetCache.java
try {
  mappableBlock.afterCache();
} catch (IOException e) {
  LOG.warn(e.getMessage());
  return;
}
{code}
# Can we include block id into the log message, that would improve debugging.
{code:java}
  LOG.info("Successfully cache one replica into persistent memory: " +
  "[path=" + filePath + ", length=" + length + "]");

to

  LOG.info("Successfully cached one replica:{} into persistent memory"
  + ", [cached path={}, length={}]", key, filePath, length);
{code}


was (Author: rakeshr):
Adding review comments, please take care.
 # How about adding an API to interface 
{{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It 
can return specific flag value to differentiate NVMe/DRAM based cache.
{code:java}
public boolean isPmemCacheEnabled() {
return mappableBlockLoader instanceof PmemMappableBlockLoader;
}
{code}

 # I'd like to avoid type casting. It won't work with another Pmem 
implementation, right?
{code:java}
  public String getReplicaCachePath(String bpid, long blockId) {
if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) {
  return null;
}
ExtendedBlockId key = new ExtendedBlockId(blockId, bpid);
String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader)
.getPmemVolumeManager()
.getCachedFilePath(key);
return cachePath;
  }
{code}

 # Below type casting can be replaced with HDFS-14393 interface.
{code:java}
  /**
   * Get cache capacity of persistent memory.
   * TODO: advertise this metric to NameNode by FSDatasetMBean
   */
  public long getPmemCacheCapacity() {
if (isPmemCacheEnabled()) {
  return 

[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-28 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803612#comment-16803612
 ] 

Rakesh R commented on HDFS-14355:
-

Adding review comments, please take care.
 # How about adding an API to interface 
{{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It 
can return specific flag value to differentiate NVMe/DRAM based cache.
{code:java}
public boolean isPmemCacheEnabled() {
return mappableBlockLoader instanceof PmemMappableBlockLoader;
}
{code}

 # I'd like to avoid type casting. It won't work with another Pmem 
implementation, right?
{code:java}
  public String getReplicaCachePath(String bpid, long blockId) {
if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) {
  return null;
}
ExtendedBlockId key = new ExtendedBlockId(blockId, bpid);
String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader)
.getPmemVolumeManager()
.getCachedFilePath(key);
return cachePath;
  }
{code}

 # Below type casting can be replaced with HDFS-14393 interface.
{code:java}
  /**
   * Get cache capacity of persistent memory.
   * TODO: advertise this metric to NameNode by FSDatasetMBean
   */
  public long getPmemCacheCapacity() {
if (isPmemCacheEnabled()) {
  return ((PmemMappableBlockLoader)mappableBlockLoader)
  .getPmemVolumeManager().getPmemCacheCapacity();
}
return 0;
  }
  
public long getPmemCacheUsed() {
if (isPmemCacheEnabled()) {
  return ((PmemMappableBlockLoader)mappableBlockLoader)
  .getPmemVolumeManager().getPmemCacheUsed();
}
return 0;
  }
{code}

 # {{FsDatasetUtil#deleteMappedFile}} - try-catch is not required, can we do 
like,
{code:java}
  public static void deleteMappedFile(String filePath) throws IOException {


boolean result = Files.deleteIfExists(Paths.get(filePath));
if (!result) {
  throw new IOException("Failed to delete the mapped file: " + filePath);
}
  }
{code}

 # Why cant't we avoid {{LocalReplica}} changes and read directly from Util 
like below,
{code:java}
FsDatasetImpl#getBlockInputStreamWithCheckingPmemCache()
.
if (cachePath != null) {
  return FsDatasetUtil.getInputStreamAndSeek(new File(cachePath),
  seekOffset);
}
{code}

 # As the class {{PmemVolumeManager}} itself represents {{Pmem}} so its good to 
remove this extra keyword from the methods and entities from this class - 
PmemUsedBytesCount, getPmemCacheUsed, getPmemCacheCapacity etc..
 # Please avoid unchecked conversion and we can do like,
{code:java}
PmemVolumeManager.java

  private final Map blockKeyToVolume =
  new ConcurrentHashMap<>();
  
  Map getBlockKeyToVolume() {
return blockKeyToVolume;
  }
{code}

 # Add exception message in PmemVolumeManager#verifyIfValidPmemVolume
{code:java}
  if (out == null) {
throw new IOException();
  }
{code}

 # Here {{IOException}} clause is not required, please remove it. We can add it 
later, if needed.
{code:java}
MappableBlock.java
void afterCache() throws IOException;

FsDatasetCache.java
try {
  mappableBlock.afterCache();
} catch (IOException e) {
  LOG.warn(e.getMessage());
  return;
}
{code}

 # Can we include block id into the log message, that would improve debugging.
{code:java}
  LOG.info("Successfully cache one replica into persistent memory: " +
  "[path=" + filePath + ", length=" + length + "]");

to

  LOG.info("Successfully cached one replica:{} into persistent memory"
  + ", [cached path={}, length={}]", key, filePath, length);
{code}

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803554#comment-16803554
 ] 

Rakesh R edited comment on HDFS-14393 at 3/28/19 3:05 AM:
--

Thanks [~umamaheswararao] for the review comments. I have attached another 
patch addressing the same.

 Note: I have moved TestFsDatasetCache to 
{{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl}} to avoid making 
{{CacheStats}} public.


was (Author: rakeshr):
Thanks [~umamaheswararao] for the review comments. I have attached another 
patch addressing the same.

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

Attachment: HDFS-14393-002.patch

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803554#comment-16803554
 ] 

Rakesh R commented on HDFS-14393:
-

Thanks [~umamaheswararao] for the review comments. I have attached another 
patch addressing the same.

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch, HDFS-14393-002.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-27 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803307#comment-16803307
 ] 

Rakesh R commented on HDFS-14355:
-

Thanks [~PhiloHe] for the updates. I have created HDFS-14393 to make 
{{FsDatasetCache}} more cleaner and interacts with loader, this would avoid the 
specific type casting to {{PMem}} in the current patch. I will continue 
reviewing your patch.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803306#comment-16803306
 ] 

Rakesh R commented on HDFS-14393:
-

[~umamaheswararao], [~PhiloHe] I have attached the patch by moving 
{{FsDatasetCache}} memory related methods to {{MemoryMappableBlockLoader}} 
implementation. Please review.

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

Status: Patch Available  (was: Open)

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14393:

Attachment: HDFS-14393-001.patch

> Move stats related methods to MappableBlockLoader
> -
>
> Key: HDFS-14393
> URL: https://issues.apache.org/jira/browse/HDFS-14393
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-14393-001.patch
>
>
> This jira sub-task is to move stats related methods to specific loader and 
> make FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14393) Move stats related methods to MappableBlockLoader

2019-03-27 Thread Rakesh R (JIRA)
Rakesh R created HDFS-14393:
---

 Summary: Move stats related methods to MappableBlockLoader
 Key: HDFS-14393
 URL: https://issues.apache.org/jira/browse/HDFS-14393
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This jira sub-task is to move stats related methods to specific loader and make 
FsDatasetCache more cleaner to plugin DRAM and PMem implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-21 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797940#comment-16797940
 ] 

Rakesh R commented on HDFS-14355:
-

{quote} FileMappableBlockLoader: Actually this class implementing the block 
loading for pmem. So, should this name say 
PmemFileMappableBlockLoader/PmemMappableBlockLoader?
HDFS-14356 impl may name that implementation class name as 
NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make 
sense ?
{quote}
{{PmemMappableBlockLoader}} and {{NativePmemMappableBlockLoader}} looks good to 
me.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-18 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794924#comment-16794924
 ] 

Rakesh R commented on HDFS-14355:
-

{quote}This property specifies the cache capactiy for both memory & pmem. We 
kept same behavior upon the specified cache capacity for pmem cache as that for 
memory cache.
{quote}
Please look at my above comment#8. As we know the existing code deals with only 
the OS page cache, but now adding pmem as well and requires special 
intelligence to manage the stats/overflows if we allow to plug in two entities 
together. Just a quick thought is, to add new configuration 
{{dfs.datanode.cache.pmem.capacity}} and reserve/release logic can be moved to 
specific MappableBlockLoader's.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-18 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794827#comment-16794827
 ] 

Rakesh R edited comment on HDFS-14355 at 3/18/19 8:31 AM:
--

Thanks [~PhiloHe] for the good progress. Adding second set of review comments, 
please go through it.
# Close {{file = new RandomAccessFile(filePath, "rw");}}
{code:java}

IOUtils.closeQuietly(file);
{code}
# Looks like unused code, please remove it.
{code:java}
  private FsDatasetImpl dataset;

  public MemoryMappableBlockLoader(FsDatasetImpl dataset) {
this.dataset = dataset;
  }
{code}
# FileMappableBlockLoader#loadVolumes exception handling. I feel this is not 
required, please remove it. If you still need this for some purpose, then 
please add message arg to {{IOException("Failed to parse persistent memory 
location " + location, e)}}
{code:java}
  } catch (IllegalArgumentException e) {
LOG.error("Failed to parse persistent memory location " + location +
" for " + e.getMessage());
throw new IOException(e);
  }
{code}
# Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add 
exception message arg to {{throw new IOException(t);}}
{code:java}
  throw new IOException(
  "Exception while writing data to persistent storage dir: " + pmemDir,
  t);
{code}
# Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the 
exception message.
{code:java}
  if (out == null) {
throw new IOException("Fail to map the block " + blockFileName
+ " to persistent storage.");
  }
{code}
# Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock
{code:java}
  throw new IOException(
  "checksum verification failed for the blockfile:" + blockFileName
  + ":  premature EOF");
{code}
# FileMappedBlock#afterCache. Suppressing exception may give wrong statistics, 
right? Assume, {{afterCache}} throws exception and not cached the file path. 
Here, the cached block won't be readable but unnecessarily consumes space. How 
about moving {{mappableBlock.afterCache();}} call right after 
{{mappableBlockLoader.load()}} function and add throws IOException clause to 
{{afterCache}} ?
{code:java}
  LOG.warn("Fail to find the replica file of PoolID = " +
  key.getBlockPoolId() + ", BlockID = " + key.getBlockId() +
  " for :" + e.getMessage());
{code}
# FsDatasetCache.java : reserve() and release() OS page size math is not 
required in FileMappedBlock. Appreciate if you could avoid these calls. Also, 
can you re-visit the caching and un-caching logic(for example, 
datanode.getMetrics() updates etc ) present in this class.
{code:java}
CachingTask#run(){

long newUsedBytes = reserve(length);
...
if (reservedBytes) {
   release(length);
}

UncachingTask#run() {
...
long newUsedBytes = release(value.mappableBlock.getLength());
{code}
# I have changed jira status and triggered QA. Please fix checkstyle warnings 
and test case failures. 
Also, can you uncomment {{Test//(timeout=12)}} two occurrences in the test.


was (Author: rakeshr):
Thanks [~PhiloHe] for the good progress. Adding second set of review comments, 
please go through it.
 # Close {{file = new RandomAccessFile(filePath, "rw");}}
{code:java}

IOUtils.closeQuietly(file);
{code}

 # Looks like unused code, please remove it.
{code:java}
  private FsDatasetImpl dataset;

  public MemoryMappableBlockLoader(FsDatasetImpl dataset) {
this.dataset = dataset;
  }
{code}

 # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not 
required, please remove it. If you still need this for some purpose, then 
please add message arg to {{IOException("Failed to parse persistent memory 
location " + location, e)}}
{code:java}
  } catch (IllegalArgumentException e) {
LOG.error("Failed to parse persistent memory location " + location +
" for " + e.getMessage());
throw new IOException(e);
  }
{code}

 # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add 
exception message arg to {{throw new IOException(t);}}
{code:java}
  throw new IOException(
  "Exception while writing data to persistent storage dir: " + pmemDir,
  t);
{code}

 # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the 
exception message.
{code:java}
  if (out == null) {
throw new IOException("Fail to map the block " + blockFileName
+ " to persistent storage.");
  }
{code}

 # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock
{code:java}
  throw new IOException(
  "checksum verification failed for the blockfile:" + 

[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-18 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794827#comment-16794827
 ] 

Rakesh R commented on HDFS-14355:
-

Thanks [~PhiloHe] for the good progress. Adding second set of review comments, 
please go through it.
 # Close {{file = new RandomAccessFile(filePath, "rw");}}
{code:java}

IOUtils.closeQuietly(file);
{code}

 # Looks like unused code, please remove it.
{code:java}
  private FsDatasetImpl dataset;

  public MemoryMappableBlockLoader(FsDatasetImpl dataset) {
this.dataset = dataset;
  }
{code}

 # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not 
required, please remove it. If you still need this for some purpose, then 
please add message arg to {{IOException("Failed to parse persistent memory 
location " + location, e)}}
{code:java}
  } catch (IllegalArgumentException e) {
LOG.error("Failed to parse persistent memory location " + location +
" for " + e.getMessage());
throw new IOException(e);
  }
{code}

 # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add 
exception message arg to {{throw new IOException(t);}}
{code:java}
  throw new IOException(
  "Exception while writing data to persistent storage dir: " + pmemDir,
  t);
{code}

 # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the 
exception message.
{code:java}
  if (out == null) {
throw new IOException("Fail to map the block " + blockFileName
+ " to persistent storage.");
  }
{code}

 # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock
{code:java}
  throw new IOException(
  "checksum verification failed for the blockfile:" + blockFileName
  + ":  premature EOF");
{code}

 # FileMappedBlock#afterCache. Suppressing exception may give wrong statistics, 
right? Assume, {{afterCache}} throws exception and not cached the file path. 
Here, the cached block won't be readable but unnecessarily consumes space. How 
about moving {{mappableBlock.afterCache();}} call right after 
{{mappableBlockLoader.load()}} function and add throws IOException clause to 
{{afterCache}} ?
{code:java}
  LOG.warn("Fail to find the replica file of PoolID = " +
  key.getBlockPoolId() + ", BlockID = " + key.getBlockId() +
  " for :" + e.getMessage());
{code}

 # FsDatasetCache.java : reserve() and release() OS page size math is not 
required in FileMappedBlock. Appreciate if you could avoid these calls. Also, 
can you re-visit the caching and un-caching logic(for example, 
datanode.getMetrics() updates etc ) present in this class.
{code:java}
CachingTask#run(){

long newUsedBytes = reserve(length);
...
if (reservedBytes) {
   release(length);
}

UncachingTask#run() {
...
long newUsedBytes = release(value.mappableBlock.getLength());
{code}

 # I have changed jira status and triggered QA. Please fix checkstyle warnings 
and test case failures. Also, can you uncomment {{Test//(timeout=12)}} two 
occurrences in the test.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-17 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reopened HDFS-14355:
-

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-17 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R resolved HDFS-14355.
-
Resolution: Unresolved

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-17 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14355:

Status: Patch Available  (was: Reopened)

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement SCM cache using pure java mapped byte buffer

2019-03-14 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792605#comment-16792605
 ] 

Rakesh R commented on HDFS-14355:
-

Thanks [~PhiloHe] for the incremental patch. Following are few quick comments, 
I will continue reviewing the patch.
 # Please rename configs: {{dfs.datanode.cache.loader.impl.classname}} to => 
{{dfs.datanode.cache.loader.class}}
 {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME}} to => 
{{DFS_DATANODE_CACHE_LOADER_CLASS}}
 {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME_DEFAULT}} to => 
{{DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT}}
 # Replace the config reading logic like below. Also, this would help avoiding 
if-else checks : {{if 
(cacheLoader.equals(MemoryMappableBlockLoader.class.getSimpleName()))}} to 
determine which class is configured by the user.
{code:java}
DFSConfigKeys.java
  public static final String DFS_DATANODE_CACHE_LOADER_CLASS = 
"dfs.datanode.cache.loader.class";
  public static final Class 
DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT = MemoryMappableBlockLoader.class;

   You can use the following way to instantiate the cache loader.
  ..
  ..
this.cacheLoader = getConf().getClass(
DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS,
DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT, 
MappableBlockLoader.class);
{code}

 # Add config name into the message {{"The persistent memory volumes are not 
configured!"}} to => {{"The persistent memory volume, " + 
DFSConfigKeys.DFS_DATANODE_CACHE_PMEM_DIR_KEY + " is not configured!"}}
 # Good to Unmaps the block {{mappedoutbuffer}} before deleting the file like 
below,
{code:java}
FileMappableBlockLoader.java

#verifyIfValidPmemVolume(){
   ...
   ...
if (file != null) {
  IOUtils.closeQuietly(file);
  NativeIO.POSIX.munmap(out);
try {
  FsDatasetUtil.deleteMappedFile(testFilePath);
} catch (IOException e) {
  LOG.warn("Failed to delete test file " + testFilePath +
  " from persistent memory", e);
}
{code}

 # FileMappableBlockLoader - Please remove the {{assert 
NativeIO.isAvailable();}} check, its not needed right?
 # Describe briefly about the filepath string formation pattern 
'{{PmemDir/BlockPoolId-BlockId'}} either at class or function level javadocs.
{code:java}
FileMappableBlockLoader#load()

filePath = getOneLocation() + "/" + key.getBlockPoolId() +
  "-" + key.getBlockId();
{code}

 # Add @VisibleForTesting to {{public static void verifyIfValidPmemVolume(File 
pmemDir)}} function
 # Add annotation to the new classes FileMappedBlock, FileMappableBlockLoader.
{code:java}
@InterfaceAudience.Private
@InterfaceStability.Unstable
{code}

 # Comments on TestCacheWithFileMappableBlockLoader:
 ## Remove MLOCK config, which is not required.
{code:java}
myConf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
CACHE_CAPACITY);
{code}

 ## Move the test TestCacheWithFileMappableBlockLoader.java class to {{package 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl}}. This will avoid making 
the class FsDatasetImpl public and infact no changes to FsDatasetImpl class 
required.

> Implement SCM cache using pure java mapped byte buffer
> --
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14355) Implement SCM cache using pure java mapped byte buffer

2019-03-14 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792605#comment-16792605
 ] 

Rakesh R edited comment on HDFS-14355 at 3/14/19 12:11 PM:
---

Thanks [~PhiloHe] for the incremental patch. Following are few quick comments, 
I will continue reviewing the patch.

# Please rename configs: {{dfs.datanode.cache.loader.impl.classname}}  to => 
{{dfs.datanode.cache.loader.class}}
  {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME}} to => 
{{DFS_DATANODE_CACHE_LOADER_CLASS}}
  {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME_DEFAULT}} to => 
{{DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT}}
# Replace the config reading logic like below. Also, this would help avoiding 
if-else checks : {{if 
(cacheLoader.equals(MemoryMappableBlockLoader.class.getSimpleName()))}} to 
determine which class is configured by the user.
{code}
DFSConfigKeys.java
  public static final String DFS_DATANODE_CACHE_LOADER_CLASS = 
"dfs.datanode.cache.loader.class";
  public static final Class 
DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT = MemoryMappableBlockLoader.class;

   You can use the following way to instantiate the cache loader.
  ..
  ..
this.cacheLoader = getConf().getClass(
DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS,
DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT, 
MappableBlockLoader.class);
{code}
# Add config name into the message {{"The persistent memory volumes are not 
configured!"}} to => {{"The persistent memory volume, " + 
DFSConfigKeys.DFS_DATANODE_CACHE_PMEM_DIR_KEY + " is not configured!"}}
# Good to Unmaps the block {{mappedoutbuffer}} before deleting the file like 
below,
{code}
FileMappableBlockLoader.java

#verifyIfValidPmemVolume(){
   ...
   ...
if (file != null) {
  IOUtils.closeQuietly(file);
  NativeIO.POSIX.munmap(out);
try {
  FsDatasetUtil.deleteMappedFile(testFilePath);
} catch (IOException e) {
  LOG.warn("Failed to delete test file " + testFilePath +
  " from persistent memory", e);
}
{code}
# FileMappableBlockLoader - Please remove the {{assert 
NativeIO.isAvailable();}} check, its not needed right?
# Describe briefly about the filepath string formation pattern 
{{'PmemDir/BlockPoolId-BlockId'}} either at class or function level javadocs.
{code}
FileMappableBlockLoader#load()

filePath = getOneLocation() + "/" + key.getBlockPoolId() +
  "-" + key.getBlockId();
{code}
# Add @VisibleForTesting to {{public static void verifyIfValidPmemVolume(File 
pmemDir)}} function
# Add annotation to the new classes FileMappedBlock, FileMappableBlockLoader.
{code}
@InterfaceAudience.Private
@InterfaceStability.Unstable
{code}
# Comments on TestCacheWithFileMappableBlockLoader:
## Remove MLOCK config, which is not required.
{code}
myConf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
CACHE_CAPACITY);
{code}
## Move the test TestCacheWithFileMappableBlockLoader.java class to {{package 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl}}. This will avoid making 
the class FsDatasetImpl public and infact no changes to FsDatasetImpl class 
required. 


was (Author: rakeshr):
Thanks [~PhiloHe] for the incremental patch. Following are few quick comments, 
I will continue reviewing the patch.
 # Please rename configs: {{dfs.datanode.cache.loader.impl.classname}} to => 
{{dfs.datanode.cache.loader.class}}
 {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME}} to => 
{{DFS_DATANODE_CACHE_LOADER_CLASS}}
 {{DFS_DATANODE_CACHE_LOADER_IMPL_CLASSNAME_DEFAULT}} to => 
{{DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT}}
 # Replace the config reading logic like below. Also, this would help avoiding 
if-else checks : {{if 
(cacheLoader.equals(MemoryMappableBlockLoader.class.getSimpleName()))}} to 
determine which class is configured by the user.
{code:java}
DFSConfigKeys.java
  public static final String DFS_DATANODE_CACHE_LOADER_CLASS = 
"dfs.datanode.cache.loader.class";
  public static final Class 
DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT = MemoryMappableBlockLoader.class;

   You can use the following way to instantiate the cache loader.
  ..
  ..
this.cacheLoader = getConf().getClass(
DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS,
DFSConfigKeys.DFS_DATANODE_CACHE_LOADER_CLASS_DEFAULT, 
MappableBlockLoader.class);
{code}

 # Add config name into the message {{"The persistent memory volumes are not 
configured!"}} to => {{"The persistent memory volume, " + 
DFSConfigKeys.DFS_DATANODE_CACHE_PMEM_DIR_KEY + " is not configured!"}}
 # Good to Unmaps the block {{mappedoutbuffer}} before deleting the file like 
below,
{code:java}
FileMappableBlockLoader.java

#verifyIfValidPmemVolume(){
   ...
   ...
if (file != null) {
  IOUtils.closeQuietly(file);
  

[jira] [Updated] (HDFS-14355) Implement SCM cache using pure java mapped byte buffer

2019-03-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14355:

Summary: Implement SCM cache using pure java mapped byte buffer  (was: 
Implement SCM cache by using pure java mapped byte buffer)

> Implement SCM cache using pure java mapped byte buffer
> --
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14355) Implement SCM cache by using pure java mapped byte buffer

2019-03-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14355:

Description: This task is to implement the caching to persistent memory 
using pure {{java.nio.MappedByteBuffer}}, which could be useful in case native 
support isn't available or convenient in some environments or platforms.

> Implement SCM cache by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14355) Implement SCM cache by using pure java mapped byte buffer

2019-03-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14355:

Summary: Implement SCM cache by using pure java mapped byte buffer  (was: 
Implement SCM cache by using mapped byte buffer without PMDK dependency)

> Implement SCM cache by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14354) Refactor MappableBlock to align with the implementation of SCM cache

2019-03-11 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789632#comment-16789632
 ] 

Rakesh R commented on HDFS-14354:
-

Good work, [~PhiloHe]. I've submitted patch to get QA report. Please take care 
{{checkstyle}} warnings.

Adding few comments on patch:
 # Please add javadoc to {{MappableBlockLoader}} abstract class.
 # MappableBlockLoader.java - '{{mmap and mlock the block, and then verify its 
checksum'}} please change it to '{{mmap the block, and then verify its 
checksum'}} as mlock is very specific to Memory.
 # FsDatasetCache.java - Please remove the unused method. Will add this later 
when you add unit testcases and that time will review the necessity of exposing 
this.
{code:java}
+import com.google.common.annotations.VisibleForTesting;


+
+ @VisibleForTesting
+ public MappableBlockLoader getMappableBlockLoader() {
+ return mappableBlockLoader;
+ }
{code}
 # Any specific reason to remove the {{return;}} statement. If not, please keep 
the existing behavior.
{code:java}
} catch (ChecksumException e) {
// Exception message is bogus since this wasn't caused by a file read
LOG.warn("Failed to cache " + key + ": checksum verification failed.");
- return;
{code}

> Refactor MappableBlock to align with the implementation of SCM cache
> 
>
> Key: HDFS-14354
> URL: https://issues.apache.org/jira/browse/HDFS-14354
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14354.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14354) Refactor MappableBlock to align with the implementation of SCM cache

2019-03-11 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-14354:

Status: Patch Available  (was: Open)

> Refactor MappableBlock to align with the implementation of SCM cache
> 
>
> Key: HDFS-14354
> URL: https://issues.apache.org/jira/browse/HDFS-14354
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14354.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives

2019-02-04 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760449#comment-16760449
 ] 

Rakesh R edited comment on HDFS-13762 at 2/5/19 4:46 AM:
-

Thanks [~PhiloHe] for the continuous efforts!
{quote}Yes, it's a limitation for all the volumes together.
{quote}
OK, it would be concern when there are different sizes volumes. Agreed to take 
up in a follow up jira task.
{quote}We maintains the map from block file to cache-file on pmem storage, so 
no need to search for it among the volumes
{quote}
Great!
{quote}We don't have a evict logic here or throw a VolumeFullException, it just 
works like the memory-cache for compatible now.
{quote}
Please add this case to the system test plan. It would be good to test it and 
document/know the behavior.
{quote}(iii), we will update the patch accordingly
{quote}
Please change below one also.
{code:java}
+  Pmem.unmapBlock(region.getAddress(), region.getLength());
+  boolean deled = false;
{code}
 

Adding few more comments, please take care:

1). MappableBlock interface looks good, please add javadocs for the below 
functions. Also, please reflect specific class MappableBlock, PmemMappedBlock, 
MemoryMappedBlock responsibility in javadoc instead of using the same 
"Represents an HDFS block that is mmapped by the DataNode."
{code:java}
  long getLength();

  void afterCache();
{code}
2). Change the comment to PMDK instead of ISA-L!
{code:java}
+  // Load Intel ISA-L
+  #ifdef UNIX
{code}
3). Could you please brief the difference between {{pmemDrain}} and 
{{pmemSync}}. Also, I would appreciate if you could add javadoc so that the 
functionality will be visible to the readers.
{code:java}
private static native boolean isPmemCheck(long address, long length);
private static native PmemMappedRegion pmemCreateMapFile(String path,
long length);
private static native boolean pmemUnMap(long address, long length);
private static native void pmemCopy(byte[] src, long dest, boolean isPmem,
long length);
private static native void pmemDrain();
private static native void pmemSync(long address, long length);
{code}


was (Author: rakeshr):
Thanks [~PhiloHe] for the continuous efforts!

{quote}Yes, it's a limitation for all the volumes together.
{quote}
OK, it would be concern when there are different sizes volumes. Agreed to take 
up in a follow up jira task.
{quote}We maintains the map from block file to cache-file on pmem storage, so 
no need to search for it among the volumes
{quote}
Great!
{quote}We don't have a evict logic here or throw a VolumeFullException, it just 
works like the memory-cache for compatible now.
{quote}
Please add this case to the system test plan. It would be good to test it and 
document/know the behavior.
{quote}(iii), we will update the patch accordingly
{quote}
Please change below one also.
{code:java}
+  Pmem.unmapBlock(region.getAddress(), region.getLength());
+  boolean deled = false;
{code}
 

Adding few more comments, please take care:
 # MappableBlock interface looks good, please add javadocs for the below 
functions. Also, please reflect specific class MappableBlock, PmemMappedBlock, 
MemoryMappedBlock responsibility in javadoc instead of using the same 
"Represents an HDFS block that is mmapped by the DataNode."
{code:java}
  long getLength();

  void afterCache();
{code}

 # Change the comment to PMDK instead of ISA-L!
{code:java}
+  // Load Intel ISA-L
+  #ifdef UNIX
{code}

 # Could you please brief the difference between {{pmemDrain}} and 
{{pmemSync}}. Also, I would appreciate if you could add javadoc so that the 
functionality will be visible to the readers.
{code:java}
private static native boolean isPmemCheck(long address, long length);
private static native PmemMappedRegion pmemCreateMapFile(String path,
long length);
private static native boolean pmemUnMap(long address, long length);
private static native void pmemCopy(byte[] src, long dest, boolean isPmem,
long length);
private static native void pmemDrain();
private static native void pmemSync(long address, long length);
{code}

> Support non-volatile storage class memory(SCM) in HDFS cache directives
> ---
>
> Key: HDFS-13762
> URL: https://issues.apache.org/jira/browse/HDFS-13762
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, datanode
>Reporter: Sammi Chen
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, 
> HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, 
> HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, 
> HDFS-13762.008.patch, SCMCacheDesign-2018-11-08.pdf, 

[jira] [Commented] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives

2019-02-04 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760449#comment-16760449
 ] 

Rakesh R commented on HDFS-13762:
-

Thanks [~PhiloHe] for the continuous efforts!

{quote}Yes, it's a limitation for all the volumes together.
{quote}
OK, it would be concern when there are different sizes volumes. Agreed to take 
up in a follow up jira task.
{quote}We maintains the map from block file to cache-file on pmem storage, so 
no need to search for it among the volumes
{quote}
Great!
{quote}We don't have a evict logic here or throw a VolumeFullException, it just 
works like the memory-cache for compatible now.
{quote}
Please add this case to the system test plan. It would be good to test it and 
document/know the behavior.
{quote}(iii), we will update the patch accordingly
{quote}
Please change below one also.
{code:java}
+  Pmem.unmapBlock(region.getAddress(), region.getLength());
+  boolean deled = false;
{code}
 

Adding few more comments, please take care:
 # MappableBlock interface looks good, please add javadocs for the below 
functions. Also, please reflect specific class MappableBlock, PmemMappedBlock, 
MemoryMappedBlock responsibility in javadoc instead of using the same 
"Represents an HDFS block that is mmapped by the DataNode."
{code:java}
  long getLength();

  void afterCache();
{code}

 # Change the comment to PMDK instead of ISA-L!
{code:java}
+  // Load Intel ISA-L
+  #ifdef UNIX
{code}

 # Could you please brief the difference between {{pmemDrain}} and 
{{pmemSync}}. Also, I would appreciate if you could add javadoc so that the 
functionality will be visible to the readers.
{code:java}
private static native boolean isPmemCheck(long address, long length);
private static native PmemMappedRegion pmemCreateMapFile(String path,
long length);
private static native boolean pmemUnMap(long address, long length);
private static native void pmemCopy(byte[] src, long dest, boolean isPmem,
long length);
private static native void pmemDrain();
private static native void pmemSync(long address, long length);
{code}

> Support non-volatile storage class memory(SCM) in HDFS cache directives
> ---
>
> Key: HDFS-13762
> URL: https://issues.apache.org/jira/browse/HDFS-13762
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, datanode
>Reporter: Sammi Chen
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, 
> HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, 
> HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, 
> HDFS-13762.008.patch, SCMCacheDesign-2018-11-08.pdf, SCMCacheTestPlan.pdf
>
>
> No-volatile storage class memory is a type of memory that can keep the data 
> content after power failure or between the power cycle. Non-volatile storage 
> class memory device usually has near access speed as memory DIMM while has 
> lower cost than memory.  So today It is usually used as a supplement to 
> memory to hold long tern persistent data, such as data in cache. 
> Currently in HDFS, we have OS page cache backed read only cache and RAMDISK 
> based lazy write cache.  Non-volatile memory suits for both these functions. 
> This Jira aims to enable storage class memory first in read cache. Although 
> storage class memory has non-volatile characteristics, to keep the same 
> behavior as current read only cache, we don't use its persistent 
> characteristics currently.  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13762) Support non-volatile storage class memory(SCM) in HDFS cache directives

2019-01-03 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733867#comment-16733867
 ] 

Rakesh R commented on HDFS-13762:
-

[~zhouwei], [~Sammi], Great proposal, thanks for your work.

[i) Few clarifications about multiple {{pmemVolumes}} configuration:
 # Is the MaxLockedMemory limit applicable to all the "pmemVolumes" together?
 # IIUC, you have mentioned about robin policy to choose a directory for each 
new DNA_CACHE command. I'd like to understand about the look up, will it 
maintain any indexing or just do random search and if it doesn't find the item 
in a volume then search through next volume?
 # Is there any automatic eviction logic available once volume reaches 
threshold or will it throw VolumeFullException to the users?

(ii) Please add "{{dfs.datanode.cache.pmem.dirs}}" to hdfs-default.xml config 
doc file, that would make 'TestHdfsConfigFields' test happy.
 (iii) Typo:
 {{boolean deled = false; ==> boolean deleted = false;}}

> Support non-volatile storage class memory(SCM) in HDFS cache directives
> ---
>
> Key: HDFS-13762
> URL: https://issues.apache.org/jira/browse/HDFS-13762
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, datanode
>Reporter: Sammi Chen
>Assignee: Wei Zhou
>Priority: Major
> Attachments: HDFS-13762.000.patch, HDFS-13762.001.patch, 
> HDFS-13762.002.patch, HDFS-13762.003.patch, HDFS-13762.004.patch, 
> HDFS-13762.005.patch, HDFS-13762.006.patch, HDFS-13762.007.patch, 
> SCMCacheDesign-2018-11-08.pdf, SCMCacheTestPlan.pdf
>
>
> No-volatile storage class memory is a type of memory that can keep the data 
> content after power failure or between the power cycle. Non-volatile storage 
> class memory device usually has near access speed as memory DIMM while has 
> lower cost than memory.  So today It is usually used as a supplement to 
> memory to hold long tern persistent data, such as data in cache. 
> Currently in HDFS, we have OS page cache backed read only cache and RAMDISK 
> based lazy write cache.  Non-volatile memory suits for both these functions. 
> This Jira aims to enable storage class memory first in read cache. Although 
> storage class memory has non-volatile characteristics, to keep the same 
> behavior as current read only cache, we don't use its persistent 
> characteristics currently.  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in HDFS

2018-09-21 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623449#comment-16623449
 ] 

Rakesh R commented on HDFS-10285:
-

I've updated release notes and resolved this jira. Thank you very much to all 
contributors for your time, efforts and useful discussions in making this 
feature!

> Storage Policy Satisfier in HDFS
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS SPS Test Report-31July2018-v1.pdf, 
> HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in HDFS

2018-09-21 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10285:

  Resolution: Fixed
Release Note: StoragePolicySatisfier(SPS) allows users to track and satisfy 
the storage policy requirement of a given file/directory in HDFS. User can 
specify a file/directory path by invoking “hdfs storagepolicies 
-satisfyStoragePolicy -path ” command or via 
HdfsAdmin#satisfyStoragePolicy(path) API. For the blocks which has storage 
policy mismatches, it moves the replicas to a different storage type in order 
to fulfill the storage policy requirement. Since API calls goes to NN for 
tracking the invoked satisfier path(iNodes), administrator need to enable 
dfs.storage.policy.satisfier.mode’ config at NN to allow these operations. It 
can be enabled by setting ‘dfs.storage.policy.satisfier.mode’ to ‘external’ in 
hdfs-site.xml. The configs can be disabled dynamically without restarting 
Namenode. SPS should be started outside Namenode using "hdfs --daemon start 
sps". If administrator is looking to run Mover tool explicitly, then he/she 
should make sure to disable SPS first and then run Mover. See the "Storage 
Policy Satisfier (SPS)" section in the Archival Storage guide for detailed 
usage.
  Status: Resolved  (was: Patch Available)

> Storage Policy Satisfier in HDFS
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS SPS Test Report-31July2018-v1.pdf, 
> HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10285) Storage Policy Satisfier in HDFS

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10285:

Fix Version/s: HDFS-10285

> Storage Policy Satisfier in HDFS
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS SPS Test Report-31July2018-v1.pdf, 
> HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12995) [SPS] : Merge work for HDFS-10285 branch

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-12995:

Fix Version/s: 3.2.0
   HDFS-10285

> [SPS] : Merge work for HDFS-10285 branch
> 
>
> Key: HDFS-12995
> URL: https://issues.apache.org/jira/browse/HDFS-12995
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-10285-consolidated-merge-patch-01.patch
>
>
> This Jira is to run aggregated HDFS-10285 branch patch against trunk and 
> check for any jenkins issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13808) [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and #getFileInfo() function

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13808:

Fix Version/s: 3.2.0
   HDFS-10285

> [SPS]: Remove unwanted FSNamesystem #isFileOpenedForWrite() and 
> #getFileInfo() function
> ---
>
> Key: HDFS-13808
> URL: https://issues.apache.org/jira/browse/HDFS-13808
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Minor
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-13808-HDFS-10285-00.patch, 
> HDFS-13808-HDFS-10285-01.patch, HDFS-13808-HDFS-10285-02.patch, 
> HDFS-13808-HDFS-10285-03.patch, HDFS-13808-HDFS-10285-04.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10885) [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier is on

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10885:

Fix Version/s: 3.2.0

> [SPS]: Mover tool should not be allowed to run when Storage Policy Satisfier 
> is on
> --
>
> Key: HDFS-10885
> URL: https://issues.apache.org/jira/browse/HDFS-10885
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-10800-HDFS-10885-00.patch, 
> HDFS-10800-HDFS-10885-01.patch, HDFS-10800-HDFS-10885-02.patch, 
> HDFS-10885-HDFS-10285-10.patch, HDFS-10885-HDFS-10285-11.patch, 
> HDFS-10885-HDFS-10285.03.patch, HDFS-10885-HDFS-10285.04.patch, 
> HDFS-10885-HDFS-10285.05.patch, HDFS-10885-HDFS-10285.06.patch, 
> HDFS-10885-HDFS-10285.07.patch, HDFS-10885-HDFS-10285.08.patch, 
> HDFS-10885-HDFS-10285.09.patch
>
>
> These two can not work at the same time to avoid conflicts and fight with 
> each other.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11670) [SPS]: Add CLI command for satisfy storage policy operations

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11670:

Fix Version/s: 3.2.0

> [SPS]: Add CLI command for satisfy storage policy operations
> 
>
> Key: HDFS-11670
> URL: https://issues.apache.org/jira/browse/HDFS-11670
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11670-HDFS-10285.001.patch, 
> HDFS-11670-HDFS-10285.002.patch, HDFS-11670-HDFS-10285.003.patch, 
> HDFS-11670-HDFS-10285.004.patch, HDFS-11670-HDFS-10285.005.patch
>
>
> This jira to discuss and implement set of satisfy storage policy 
> sub-commands. Following are the list of sub-commands:
> # Schedule blocks to move based on file/directory policy:
> {code}hdfs storagepolicies -satisfyStoragePolicy -path ]{code}
> # Its good to have one command to check SPS is enabled or not. Based on this 
> user can take the decision to run the Mover:
> {code}
> hdfs storagepolicies -isSPSRunning
> {code} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10794:

Fix Version/s: 3.2.0

> [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the 
> block storage movement work
> 
>
> Key: HDFS-10794
> URL: https://issues.apache.org/jira/browse/HDFS-10794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch, 
> HDFS-10794-HDFS-10285.01.patch, HDFS-10794-HDFS-10285.02.patch, 
> HDFS-10794-HDFS-10285.03.patch
>
>
> The idea of this jira is to implement a mechanism to move the blocks to the 
> given target in order to satisfy the block storage policy. Datanode receives 
> {{blocktomove}} details via heart beat response from NN. More specifically, 
> its a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11029) [SPS]:Provide retry mechanism for the blocks which were failed while moving its storage at DNs

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11029:

Fix Version/s: 3.2.0

> [SPS]:Provide retry mechanism for the blocks which were failed while moving 
> its storage at DNs
> --
>
> Key: HDFS-11029
> URL: https://issues.apache.org/jira/browse/HDFS-11029
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11029-HDFS-10285-00.patch, 
> HDFS-11029-HDFS-10285-01.patch, HDFS-11029-HDFS-10285-02.patch
>
>
> When DN co-ordinator finds some of blocks associated to trackedID could not 
> be moved its storages, due to some errors.Here retry may work in some cases, 
> example if target node has no space. Then retry by finding another target can 
> work. 
> So, based on the movement result flag(SUCCESS/FAILURE) from DN Co-ordinator,  
> NN would retry by scanning the blocks again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11243) [SPS]: Add a protocol command from NN to DN for dropping the SPS work and queues

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11243:

Fix Version/s: 3.2.0

> [SPS]: Add a protocol command from NN to DN for dropping the SPS work and 
> queues 
> -
>
> Key: HDFS-11243
> URL: https://issues.apache.org/jira/browse/HDFS-11243
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11243-HDFS-10285-00.patch, 
> HDFS-11243-HDFS-10285-01.patch, HDFS-11243-HDFS-10285-02.patch
>
>
> This JIRA is for adding a protocol command from Namenode to Datanode for 
> dropping SPS work. and Also for dropping in progress queues.
> Use case is: when admin deactivated SPS at NN, then internally NN should 
> issue a command to DNs for dropping in progress queues as well. This command 
> can be packed via heartbeat. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11762) [SPS] : Empty files should be ignored in StoragePolicySatisfier.

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11762:

Fix Version/s: 3.2.0

> [SPS] : Empty files should be ignored in StoragePolicySatisfier. 
> -
>
> Key: HDFS-11762
> URL: https://issues.apache.org/jira/browse/HDFS-11762
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11762-HDFS-10285.001.patch, 
> HDFS-11762-HDFS-10285.002.patch, HDFS-11762-HDFS-10285.003.patch, 
> HDFS-11762-HDFS-10285.004.patch
>
>
> File which has zero block should be ignored in SPS. Currently it is throwing 
> NPE in StoragePolicySatisfier thread.
> {noformat}
> 2017-05-06 23:29:04,735 [StoragePolicySatisfier] ERROR 
> namenode.StoragePolicySatisfier (StoragePolicySatisfier.java:run(278)) - 
> StoragePolicySatisfier thread received runtime exception. Stopping Storage 
> policy satisfier work
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.analyseBlocksStorageMovementsAndAssignToDN(StoragePolicySatisfier.java:292)
>   at 
> org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.run(StoragePolicySatisfier.java:233)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12982) [SPS]: Reduce the locking and cleanup the Namesystem access

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-12982:

Fix Version/s: 3.2.0

> [SPS]: Reduce the locking and cleanup the Namesystem access
> ---
>
> Key: HDFS-12982
> URL: https://issues.apache.org/jira/browse/HDFS-12982
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-12982-HDFS-10285-00.patch, 
> HDFS-12982-HDFS-10285-01.patch, HDFS-12982-HDFS-10285-02.patch, 
> HDFS-12982-HDFS-10285-03.patch
>
>
> This task is to optimize the NS lock usage in SPS and cleanup the Namesystem 
> access via {{Context}} interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12214) [SPS]: Fix review comments of StoragePolicySatisfier feature

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-12214:

Fix Version/s: 3.2.0

> [SPS]: Fix review comments of StoragePolicySatisfier feature
> 
>
> Key: HDFS-12214
> URL: https://issues.apache.org/jira/browse/HDFS-12214
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-12214-HDFS-10285-00.patch, 
> HDFS-12214-HDFS-10285-01.patch, HDFS-12214-HDFS-10285-02.patch, 
> HDFS-12214-HDFS-10285-03.patch, HDFS-12214-HDFS-10285-04.patch, 
> HDFS-12214-HDFS-10285-05.patch, HDFS-12214-HDFS-10285-06.patch, 
> HDFS-12214-HDFS-10285-07.patch, HDFS-12214-HDFS-10285-08.patch
>
>
> This sub-task is to address [~andrew.wang]'s review comments. Please refer 
> the [review 
> comment|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16103734=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16103734]
>  in HDFS-10285 umbrella jira.
> # Rename configuration property 'dfs.storage.policy.satisfier.activate' to 
> 'dfs.storage.policy.satisfier.enabled'
> # Disable SPS feature by default.
> # Rather than using the acronym (which a user might not know), maybe rename 
> "-isSpsRunning" to "-isSatisfierRunning"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13033) [SPS]: Implement a mechanism to do file block movements for external SPS

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13033:

Fix Version/s: 3.2.0

> [SPS]: Implement a mechanism to do file block movements for external SPS
> 
>
> Key: HDFS-13033
> URL: https://issues.apache.org/jira/browse/HDFS-13033
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-13033-HDFS-10285-00.patch, 
> HDFS-13033-HDFS-10285-01.patch, HDFS-13033-HDFS-10285-02.patch, 
> HDFS-13033-HDFS-10285-03.patch, HDFS-13033-HDFS-10285-04.patch
>
>
> HDFS-12911 modularization is introducing \{{BlockMoveTaskHandler}} interface 
> for moving the file blocks. That will help us to plugin different ways of 
> block move mechanisms, if needed.
> For Internal SPS, we have simple blk movement tasks to target DN descriptors. 
> For external SPS, we should have mechanism to send \{{replaceBlock}} on 
> target node and have a listener to track the block movement completion. 
> This is the task to implement the \{{ExternalSPSBlockMoveTaskHandler}} plugin 
> for external SPS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11186) [SPS]: Daemon thread of SPS should start only in Active NN

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11186:

Fix Version/s: 3.2.0

> [SPS]: Daemon thread of SPS should start only in Active NN
> --
>
> Key: HDFS-11186
> URL: https://issues.apache.org/jira/browse/HDFS-11186
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Wei Zhou
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11186-HDFS-10285.00.patch, 
> HDFS-11186-HDFS-10285.01.patch, HDFS-11186-HDFS-10285.02.patch, 
> HDFS-11186-HDFS-10285.03.patch, HDFS-11186-HDFS-10285.04.patch
>
>
> As discussed in [HDFS-10885 
> |https://issues.apache.org/jira/browse/HDFS-10885], we need to ensure that 
> SPS is started only in Active NN. This JIRA is opened for discussion and 
> tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13166) [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly getLiveDatanodeStorageReport() calls

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13166:

Fix Version/s: 3.2.0

> [SPS]: Implement caching mechanism to keep LIVE datanodes to minimize costly 
> getLiveDatanodeStorageReport() calls
> -
>
> Key: HDFS-13166
> URL: https://issues.apache.org/jira/browse/HDFS-13166
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-13166-HDFS-10285-00.patch, 
> HDFS-13166-HDFS-10285-01.patch, HDFS-13166-HDFS-10285-02.patch, 
> HDFS-13166-HDFS-10285-03.patch
>
>
> Presently {{#getLiveDatanodeStorageReport()}} is fetched for every file and 
> does the computation. This Jira sub-task is to discuss and implement a cache 
> mechanism which in turn reduces the number of function calls. Also, could 
> define a configurable refresh interval and periodically refresh the DN cache 
> by fetching latest {{#getLiveDatanodeStorageReport}} on this interval.
>  Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16347472]
>  Comment-7)
> {quote}Adding getDatanodeStorageReport is concerning. 
> getDatanodeListForReport is already a very bad method that should be avoided 
> for anything but jmx – even then it’s a concern. I eliminated calls to it 
> years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn 
> lock for an excessive length of time. Beyond that, the response is going to 
> be pretty large and tagging all the storage reports is not going to be cheap.
> verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
> lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
> storageMap?
> Appears to be calling getLiveDatanodeStorageReport for every file. As 
> mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
> fuzzy/cached state of the world. Then it gets another datanode report to 
> determine the number of live nodes to decide if it should sleep before 
> processing the next path. The number of nodes from the prior cached view of 
> the world should suffice.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11289) [SPS]: Make SPS movement monitor timeouts configurable

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11289:

Fix Version/s: 3.2.0

> [SPS]: Make SPS movement monitor timeouts configurable
> --
>
> Key: HDFS-11289
> URL: https://issues.apache.org/jira/browse/HDFS-11289
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11289-HDFS-10285-00.patch, 
> HDFS-11289-HDFS-10285-01.patch
>
>
> Currently SPS tracking monitor timeouts were hardcoded. This is the JIRA for 
> making it configurable.
> {code}
>  // TODO: below selfRetryTimeout and checkTimeout can be configurable later
> // Now, the default values of selfRetryTimeout and checkTimeout are 30mins
> // and 5mins respectively
> this.storageMovementsMonitor = new BlockStorageMovementAttemptedItems(
> 5 * 60 * 1000, 30 * 60 * 1000, storageMovementNeeded);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12556) [SPS] : Block movement analysis should be done in read lock.

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-12556:

Fix Version/s: 3.2.0

> [SPS] : Block movement analysis should be done in read lock.
> 
>
> Key: HDFS-12556
> URL: https://issues.apache.org/jira/browse/HDFS-12556
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-12556-HDFS-10285-01.patch, 
> HDFS-12556-HDFS-10285-02.patch, HDFS-12556-HDFS-10285-03.patch
>
>
> {noformat}
> 2017-09-27 15:58:32,852 [StoragePolicySatisfier] ERROR 
> namenode.StoragePolicySatisfier 
> (StoragePolicySatisfier.java:handleException(308)) - StoragePolicySatisfier 
> thread received runtime exception. Stopping Storage policy satisfier work
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getStorages(BlockManager.java:4130)
>   at 
> org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.analyseBlocksStorageMovementsAndAssignToDN(StoragePolicySatisfier.java:362)
>   at 
> org.apache.hadoop.hdfs.server.namenode.StoragePolicySatisfier.run(StoragePolicySatisfier.java:236)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11572) [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11572:

Fix Version/s: 3.2.0

> [SPS]: SPS should clean Xattrs when no blocks required to satisfy for a file
> 
>
> Key: HDFS-11572
> URL: https://issues.apache.org/jira/browse/HDFS-11572
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11572-HDFS-10285-00.patch, 
> HDFS-11572-HDFS-10285-01.patch
>
>
> When user calls on a file to satisfy storage policy, but that file already 
> well satisfied. This time, SPS will just scan and make sure no blocks needs 
> to satisfy and will leave that element. In this case, we are not cleaning 
> Xattrs. This is the JIRA to make sure we will clean Xattrs in this situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13057) [SPS]: Revisit configurations to make SPS service modes internal/external/none

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13057:

Fix Version/s: 3.2.0

> [SPS]: Revisit configurations to make SPS service modes internal/external/none
> --
>
> Key: HDFS-13057
> URL: https://issues.apache.org/jira/browse/HDFS-13057
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Blocker
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-13057-HDFS-10285-00.patch, 
> HDFS-13057-HDFS-10285-01.patch, HDFS-13057-HDFS-10285-02.patch
>
>
> This task is to revisit the configurations to make SPS service modes - 
> {{internal/external/none}}
>  - {{internal}} : represents SPS service will be running with NN
>  - {{external}}: represents SPS service will be running outside NN
>  - {{none}}: represents the SPS service is completely disabled and zero cost 
> to the system.
> Proposed configuration {{dfs.storage.policy.satisfier.mode}} item in 
> hdfs-site.xml file and value will be string. The mode can be changed via 
> {{reconfig}} command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10884) [SPS]: Add block movement tracker to track the completion of block movement future tasks at DN

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10884:

Fix Version/s: 3.2.0

> [SPS]: Add block movement tracker to track the completion of block movement 
> future tasks at DN
> --
>
> Key: HDFS-10884
> URL: https://issues.apache.org/jira/browse/HDFS-10884
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: HDFS-10285
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-10884-HDFS-10285-00.patch, 
> HDFS-10884-HDFS-10285-01.patch, HDFS-10884-HDFS-10285-02.patch, 
> HDFS-10884-HDFS-10285-03.patch, HDFS-10884-HDFS-10285-04.patch, 
> HDFS-10884-HDFS-10285-05.patch
>
>
> Presently 
> [StoragePolicySatisfyWorker#processBlockMovingTasks()|https://github.com/apache/hadoop/blob/HDFS-10285/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/StoragePolicySatisfyWorker.java#L147]
>  function act as a blocking call. The idea of this jira is to implement a 
> mechanism to track these movements async so that would allow other movement 
> while processing the previous one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13075) [SPS]: Provide External Context implementation.

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13075:

Fix Version/s: 3.2.0

> [SPS]: Provide External Context implementation.
> ---
>
> Key: HDFS-13075
> URL: https://issues.apache.org/jira/browse/HDFS-13075
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-13075-HDFS-10285-0.patch, 
> HDFS-13075-HDFS-10285-01.patch, HDFS-13075-HDFS-10285-02.patch, 
> HDFS-13075-HDFS-10285-1.patch
>
>
> This JIRA to provide initial implementation of External Context.
> With HDFS-12995, we improve further retry mechanism etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11338) [SPS]: Fix timeout issue in unit tests caused by longger NN down time

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11338:

Fix Version/s: 3.2.0

> [SPS]: Fix timeout issue in unit tests caused by longger NN down time
> -
>
> Key: HDFS-11338
> URL: https://issues.apache.org/jira/browse/HDFS-11338
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Wei Zhou
>Assignee: Rakesh R
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11338-HDFS-10285-02.patch, 
> HDFS-11338-HDFS-10285-03.patch, HDFS-11338-HDFS-10285-04.patch, 
> HDFS-11338-HDFS-10285-05.patch, HDFS-11338-HDFS-10285.00.patch, 
> HDFS-11338-HDFS-10285.01.patch
>
>
> As discussed in HDFS-11186, it takes longer to stop NN:
> {code}
> try {
>   storagePolicySatisfierThread.join(3000);
> } catch (InterruptedException ie) {
> }
> {code}
> So, it takes longer time to finish some tests and this leads to the timeout 
> failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11965) [SPS]: Should give chance to satisfy the low redundant blocks before removing the xattr

2018-08-12 Thread Rakesh R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-11965:

Fix Version/s: 3.2.0

> [SPS]: Should give chance to satisfy the low redundant blocks before removing 
> the xattr
> ---
>
> Key: HDFS-11965
> URL: https://issues.apache.org/jira/browse/HDFS-11965
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS-10285
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: HDFS-10285, 3.2.0
>
> Attachments: HDFS-11965-HDFS-10285.001.patch, 
> HDFS-11965-HDFS-10285.002.patch, HDFS-11965-HDFS-10285.003.patch, 
> HDFS-11965-HDFS-10285.004.patch, HDFS-11965-HDFS-10285.005.patch, 
> HDFS-11965-HDFS-10285.006.patch, HDFS-11965-HDFS-10285.007.patch, 
> HDFS-11965-HDFS-10285.008.patch
>
>
> The test case is failing because all the required replicas are not moved in 
> expected storage. This is happened because of delay in datanode registration 
> after cluster restart.
> Scenario :
> 1. Start cluster with 3 DataNodes.
> 2. Create file and set storage policy to WARM.
> 3. Restart the cluster.
> 4. Now Namenode and two DataNodes started first and  got registered with 
> NameNode. (one datanode  not yet registered)
> 5. SPS scheduled block movement based on available DataNodes (It will move 
> one replica in ARCHIVE based on policy).
> 6. Block movement also success and Xattr removed from the file because this 
> condition is true {{itemInfo.isAllBlockLocsAttemptedToSatisfy()}}.
> {code}
> if (itemInfo != null
> && !itemInfo.isAllBlockLocsAttemptedToSatisfy()) {
>   blockStorageMovementNeeded
>   .add(storageMovementAttemptedResult.getTrackId());
> 
> ..
> } else {
> 
> ..
>   this.sps.postBlkStorageMovementCleanup(
>   storageMovementAttemptedResult.getTrackId());
> }
> {code}
> 7. Now third DN registered with namenode and its reported one more DISK 
> replica. Now Namenode has two DISK and one ARCHIVE replica.
> In test case we have condition to check the number of DISK replica..
> {code} DFSTestUtil.waitExpectedStorageType(testFileName, StorageType.DISK, 1, 
> timeout, fs);{code}
> This condition never became true and test case will be timed out.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >