[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-28 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803869#comment-16803869
 ] 

Feilong He edited comment on HDFS-14355 at 3/28/19 12:14 PM:
-

HDFS-14355.006.patch was uploaded with fixing [~rakeshr]'s comments. 
Suggestions are always welcome. Thanks!


was (Author: philohe):
HDFS-14355.006.patch was uploaded with fixing [~rakeshr]'s comments. 
Suggestions are always welcome! Thanks!

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch, HDFS-14355.006.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-28 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803612#comment-16803612
 ] 

Rakesh R edited comment on HDFS-14355 at 3/28/19 6:12 AM:
--

Adding review comments, please take care.

# How about adding an API to interface 
{{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It 
can return specific flag value to differentiate NVMe/DRAM based cache.
{code:java}
public boolean isPmemCacheEnabled() {
return mappableBlockLoader instanceof PmemMappableBlockLoader;
}
{code}
# I'd like to avoid type casting. It won't work with another Pmem 
implementation, right?
{code:java}
  public String getReplicaCachePath(String bpid, long blockId) {
if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) {
  return null;
}
ExtendedBlockId key = new ExtendedBlockId(blockId, bpid);
String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader)
.getPmemVolumeManager()
.getCachedFilePath(key);
return cachePath;
  }
{code}
# Below type casting can be replaced with HDFS-14393 interface.
{code:java}
  /**
   * Get cache capacity of persistent memory.
   * TODO: advertise this metric to NameNode by FSDatasetMBean
   */
  public long getPmemCacheCapacity() {
if (isPmemCacheEnabled()) {
  return ((PmemMappableBlockLoader)mappableBlockLoader)
  .getPmemVolumeManager().getPmemCacheCapacity();
}
return 0;
  }
  
public long getPmemCacheUsed() {
if (isPmemCacheEnabled()) {
  return ((PmemMappableBlockLoader)mappableBlockLoader)
  .getPmemVolumeManager().getPmemCacheUsed();
}
return 0;
  }
{code}
# {{FsDatasetUtil#deleteMappedFile}} - try-catch is not required, can we do 
like,
{code:java}
  public static void deleteMappedFile(String filePath) throws IOException {


boolean result = Files.deleteIfExists(Paths.get(filePath));
if (!result) {
  throw new IOException("Failed to delete the mapped file: " + filePath);
}
  }
{code}
# Why cant't we avoid {{LocalReplica}} changes and read directly from Util like 
below,
{code:java}
FsDatasetImpl#getBlockInputStreamWithCheckingPmemCache()
.
if (cachePath != null) {
  return FsDatasetUtil.getInputStreamAndSeek(new File(cachePath),
  seekOffset);
}
{code}
# As the class {{PmemVolumeManager}} itself represents {{Pmem}} so its good to 
remove this extra keyword from the methods and entities from this class - 
PmemUsedBytesCount, getPmemCacheUsed, getPmemCacheCapacity etc..
# Please avoid unchecked conversion and we can do like,
{code:java}
PmemVolumeManager.java

  private final Map blockKeyToVolume =
  new ConcurrentHashMap<>();
  
  Map getBlockKeyToVolume() {
return blockKeyToVolume;
  }
{code}
# Add exception message in PmemVolumeManager#verifyIfValidPmemVolume
{code:java}
  if (out == null) {
throw new IOException();
  }
{code}
# Here {{IOException}} clause is not required, please remove it. We can add it 
later, if needed.
{code:java}
MappableBlock.java
void afterCache() throws IOException;

FsDatasetCache.java
try {
  mappableBlock.afterCache();
} catch (IOException e) {
  LOG.warn(e.getMessage());
  return;
}
{code}
# Can we include block id into the log message, that would improve debugging.
{code:java}
  LOG.info("Successfully cache one replica into persistent memory: " +
  "[path=" + filePath + ", length=" + length + "]");

to

  LOG.info("Successfully cached one replica:{} into persistent memory"
  + ", [cached path={}, length={}]", key, filePath, length);
{code}


was (Author: rakeshr):
Adding review comments, please take care.
 # How about adding an API to interface 
{{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It 
can return specific flag value to differentiate NVMe/DRAM based cache.
{code:java}
public boolean isPmemCacheEnabled() {
return mappableBlockLoader instanceof PmemMappableBlockLoader;
}
{code}

 # I'd like to avoid type casting. It won't work with another Pmem 
implementation, right?
{code:java}
  public String getReplicaCachePath(String bpid, long blockId) {
if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) {
  return null;
}
ExtendedBlockId key = new ExtendedBlockId(blockId, bpid);
String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader)
.getPmemVolumeManager()
.getCachedFilePath(key);
return cachePath;
  }
{code}

 # Below type casting can be replaced with HDFS-14393 interface.
{code:java}
  /**
   * Get cache capacity of persistent memory.
   * TODO: advertise this metric to NameNode by FSDatasetMBean
   */
  public long getPmemCacheCapacity() {
if (isPmemCacheEnabled()) {
  return 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-26 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800514#comment-16800514
 ] 

Feilong He edited comment on HDFS-14355 at 3/26/19 8:18 AM:


[~daryn], thanks so much for your pretty valuable comments.
{quote}I quickly skimmed the patch. At a high level, a plugin design should not 
leak details. This patch breaks the abstraction and litters the code with 
references to "pmem" and explicit conditionals. The details of pmem should be 
pushed down and hidden in the pmem specific classes.
{quote}
We tried to separate our impl for pmem cache with the current HDFS code. Some 
classes  for pmem were introduced, such as PmemMappableBlockLoader, 
PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some 
references for pmem, for example pmemUsedBytesCount, which may be one issue you 
pointed out implicitly. In our new patch, pmemUsedBytesCount and 
reserve/release methods for pmem will be removed from FsdatasetCache to a new 
class PmemVolumeManager. We are trying to shade such unnecessarily exposed 
details for pmem as you suggested.
{quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than 
desirable from a memory perspective. For those not using the feature, it's not 
nearly as bad as adding one to inodes but may be an issue for very dense DNs. 
More importantly, those that DO use the feature will incur a substantial memory 
hit to store the paths. Why does a SCM block need to track a completely 
different path? Isn't the SCM just another storage dir?
{quote}
We seriously considered your reasonable concerns that adding cacheFilePath in 
ReplicaInfo there may cause issues for very Dense DNs. Thanks for pointing out 
this. We will optimize this part with a new patch. In the new patch, we will 
remove cacheFilePath from ReplicaInfo. Instead, we will add a PmemVolumeManager 
for pmem cache to keep cacheFilePath(precisely, pmem volume index, which is 
enough for inferring the cache file path in our new impl). TB sized HDFS pmem 
cache should be able to cache around 10k blocks at most. For our new impl, we 
evaluated that pmem cache would not consume substantial DRAM space to maintain 
a map for 10k level cached blocks. On the other hand, enabling pmem cache can 
alleviate the pressure of competing for DRAM resource.
{quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the 
returned {{Class}} must be instantiated. It's wrong to compare the class simple 
name against {{MemoryMappableBlockLoader}}. Consider if a user configures with 
{{my.custom.MemoryMappableBlockLoader}} and it instantiates 
{{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}}
 anyway because of the matching simple name? Catching 
{{ReflectiveOperationException}} and just logging it masks a serious 
misconfiguration or error condition.
{quote}
The {{getCacheLoaderClaZZ}} will be changed to {{getCacheLoaderClass as you 
suggested. Your suggestion makes us aware of that comparing class simple name 
is inconsiderate. We will fix this issue in our new patch. Since the 
constructor of cache loader requires a FsdatasetCache}} instance as its 
parameter, we still instantiate it there as you noted. The 
{{getCacheLoaderClass}} comes from DNconf and it should not depend on 
{{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, 
it is not reasonable to catch ReflectiveOperationException and merely log it. 
We will throw a RuntimeException inside the catch block to terminate the start 
of DN.
{quote}There's quite a few other exceptions being swallowed or just logged when 
dubious "bad things" happen. Or discarding of exceptions and rethrowing generic 
"something went wrong" exceptions w/o the original exception as a cause. That 
complicates debugging.
{quote}
We are checking the code scrupulously and trying our best to fix the issues you 
mentioned to avoid complicating debugging. For loading pmem volume, we just log 
the error if one volume is not usable. We thought it is tolerable to have just 
one bad pmem volume configured by user. But an exception will be thrown to 
terminate the DN starting if all pmem volumes are invalid.
{quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not 
"ignorable". That's how bugs creep in which are insanely hard to debug. Don't 
check if offset is valid, just seek, let it throw if invalid. Don't ignore null 
filenames, just let it throw. Catching {{Throwable}} is a bad practice; let 
alone catching, discarding, and throwing a bland exception.
{quote}
Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, 
the invalid offset and null filenames shouldn't be ignored. We are also 
checking and fixing such issue in other piece of code. 
{quote}Why throw a {{RuntimeException}} in "getOneLocation"? Is the intent to 
take down the DN?
{quote}

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592
 ] 

Feilong He edited comment on HDFS-14355 at 3/26/19 1:57 AM:


Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that 
after reserving bytes for DRAM cache the used bytes will not exceed maxBytes 
(dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy 
Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since 
supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes 
management with DRAM's. Thus Lazy Persist Writes will not be affected. User can 
still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. 
Pmem may not have page size like mechanism as DRAM (we will confirm it). So we 
didn't round up the bytes to a page size like value. Because of this 
difference, DRAM cache & pmem cache have different reserve/release methods, 
which also makes adding PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
patch, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.


was (Author: philohe):
Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that 
after reserving bytes for DRAM cache the used bytes will not exceed maxBytes 
(dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy 
Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since 
supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes 
management with DRAM's. Thus Lazy Persist Writes will not be affected. User can 
still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. 
Pmem may not have page size like mechanism as DRAM (we will confirm it). So we 
didn't round up the bytes to a page size like value. Because of this 
difference, DRAM cache & pmem cache have different reserve/release methods, 
which also makes adding PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 12:51 PM:
-

Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that 
after reserving bytes for DRAM cache the used bytes will not exceed maxBytes 
(dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy 
Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since 
supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes 
management with DRAM's. Thus Lazy Persist Writes will not be affected. User can 
still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. 
Pmem may not have page size like mechanism as DRAM (we will confirm it). So we 
didn't round up the bytes to a page size like value. Because of this 
difference, DRAM cache & pmem cache have different reserve/release methods, 
which also makes adding PmemUsedBytesCount necessary.

 
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.


was (Author: philohe):
Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes.

It can ensure that after reserving bytes for DRAM cache the used bytes will not 
exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides 
HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount }}to 
reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the 
target of this jira, we introduce PmemUsedBytesCount for pmem to separate 
pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount 
have different reserve/release method, which also makes adding 
PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 12:51 PM:
-

Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that 
after reserving bytes for DRAM cache the used bytes will not exceed maxBytes 
(dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy 
Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since 
supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes 
management with DRAM's. Thus Lazy Persist Writes will not be affected. User can 
still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. 
Pmem may not have page size like mechanism as DRAM (we will confirm it). So we 
didn't round up the bytes to a page size like value. Because of this 
difference, DRAM cache & pmem cache have different reserve/release methods, 
which also makes adding PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.


was (Author: philohe):
Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that 
after reserving bytes for DRAM cache the used bytes will not exceed maxBytes 
(dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy 
Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since 
supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes 
management with DRAM's. Thus Lazy Persist Writes will not be affected. User can 
still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. 
Pmem may not have page size like mechanism as DRAM (we will confirm it). So we 
didn't round up the bytes to a page size like value. Because of this 
difference, DRAM cache & pmem cache have different reserve/release methods, 
which also makes adding PmemUsedBytesCount necessary.

 
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 12:48 PM:
-

Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, UsedBytesCount is used to count the DRAM bytes.

It can ensure that after reserving bytes for DRAM cache the used bytes will not 
exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides 
HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount }}to 
reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the 
target of this jira, we introduce PmemUsedBytesCount for pmem to separate 
pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount 
have different reserve/release method, which also makes adding 
PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.


was (Author: philohe):
Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, {{UsedBytesCount is used to count the DRAM bytes. }}

It can ensure that after reserving bytes for DRAM cache the used bytes will not 
exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides 
HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount}}to 
reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the 
target of this jira, we introduce PmemUsedBytesCount for pmem to separate 
pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount 
have different reserve/release method which also makes adding 
PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 12:45 PM:
-

Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, {{UsedBytesCount is used to count the DRAM bytes. }}

It can ensure that after reserving bytes for DRAM cache the used bytes will not 
exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides 
HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount}}to 
reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the 
target of this jira, we introduce PmemUsedBytesCount for pmem to separate 
pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount 
have different reserve/release method which also makes adding 
PmemUsedBytesCount necessary.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.


was (Author: philohe):
Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, {{UsedBytesCount is used to count the DRAM bytes. It can ensure 
that after reserving bytes for DRAM cache the used bytes will not exceed 
maxBytes (dfs.datanode.max.locked.memory) . We found that besides DRAM cache, 
Lazy Persist Writes also uses this UsedBytesCount}} to reserve/release bytes. 
Since supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduce PmemUsedBytesCount for pmem. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount 
has different reserve/release method. So we just introduced PmemUsedBytesCount.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 11:37 AM:
-

Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, {{UsedBytesCount is used to count the DRAM bytes. It can ensure 
that after reserving bytes for DRAM cache the used bytes will not exceed 
maxBytes (dfs.datanode.max.locked.memory) . We found that besides DRAM cache, 
Lazy Persist Writes also uses this UsedBytesCount}} to reserve/release bytes. 
Since supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduce PmemUsedBytesCount for pmem. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount 
has different reserve/release method. So we just introduced PmemUsedBytesCount.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class 
PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.


was (Author: philohe):
Thanks [~Sammi] for your valuable comments.
{quote}User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
{quote}
I will proofread the description to make them clear to user.
{quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
{quote}
As you know, {{UsedBytesCount }}is used to count the DRAM bytes. It can ensure 
that after reserving bytes for DRAM cache the used bytes will not exceed 
maxBytes (dfs.datanode.max.locked.memory) . We found that besides DRAM cache, 
Lazy Persist Writes also uses this {{UsedBytesCount}} to reserve/release bytes. 
Since supporting Lazy Persist Writes on pmem is not the target of this jira, we 
introduce Pmem{{UsedBytesCount}} for pmem. Thus Lazy Persist Writes will not be 
affected. User can still enable Lazy Persist Writes by configuring 
dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as 
DRAM (we will confirm it). So we didn't round up the bytes to a page size like 
value. Because of this difference, {{UsedBytesCount}} and 
Pmem{{UsedBytesCount}} has different reserve/release method. So we just 
introduced Pmem{{UsedBytesCount}}.
{quote}FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.
{quote}
Good suggestion. We are aware of this issue as you pointed out. In the new 
path, we will move Pmem{{UsedBytesCount}}, reservePmem, releasePmem to a new 
class PmemVolumeManager to keep FsDatasetCache generic.
{quote}As [~daryn] suggested, more elegant error handling.
{quote}
We are checking our code to make exceptions be handled elegantly.

 

Thanks again for your huge efforts on reviewing this patch. Your suggestions 
will be seriously considered by us.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800514#comment-16800514
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 11:25 AM:
-

[~daryn], thanks so much for your pretty valuable comments.
{quote}I quickly skimmed the patch. At a high level, a plugin design should not 
leak details. This patch breaks the abstraction and litters the code with 
references to "pmem" and explicit conditionals. The details of pmem should be 
pushed down and hidden in the pmem specific classes.
{quote}
We tried to separate our impl for pmem cache with the current HDFS code. Some 
classes  for pmem were introduced, such as PmemMappableBlockLoader, 
PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some 
references for pmem, for example pmemUsedBytesCount, which may be one issue you 
pointed out implicitly. In our new patch, pmemUsedBytesCount and 
reserve/release methods for pmem will be removed from FsdatasetCache to a new 
class PmemVolumeManager. We are trying to shade such unnecessarily exposed 
details for pmem as you suggested.
{quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than 
desirable from a memory perspective. For those not using the feature, it's not 
nearly as bad as adding one to inodes but may be an issue for very dense DNs. 
More importantly, those that DO use the feature will incur a substantial memory 
hit to store the paths. Why does a SCM block need to track a completely 
different path? Isn't the SCM just another storage dir?
{quote}
The cacheFilePath in ReplicaInfo is just used for pmem cache. Since multiple 
pmems can be configured by user, to locate the cache file on pmem, it is 
necessary to keep such info for a cached block. Also, we seriously considered 
your reasonable concerns that adding cacheFilePath there may cause issues for 
very Dense DNs. Thanks for pointing out this. We will optimize this part with a 
new patch. In the new patch, we will remove cacheFilePath from ReplicaInfo. 
Instead, we will add a PmemVolumeManager for pmem cache to keep 
cacheFilePath(precisely, pmem volume index, which is enough for inferring the 
cache file path in our new impl). TB sized HDFS pmem cache should be able to 
cache around 10k blocks at most. In our evaluation, pmem cache would not 
consume substantial DRAM space to maintain which pmem volume a block is cached 
to for 10k level blocks. On the other hand, enabling pmem cache can alleviate 
the pressure of competing DRAM resource.
{quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the 
returned {{Class}} must be instantiated. It's wrong to compare the class simple 
name against {{MemoryMappableBlockLoader}}. Consider if a user configures with 
{{my.custom.MemoryMappableBlockLoader}} and it instantiates 
{{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}}
 anyway because of the matching simple name? Catching 
{{ReflectiveOperationException}} and just logging it masks a serious 
misconfiguration or error condition.
{quote}
The {{getCacheLoaderClaZZ}} will be changed to {{getCacheLoaderClass as you 
suggested. Your suggestion makes us aware of that comparing class simple name 
is inconsiderate. We will fix this issue in our new patch. Since the 
constructor of cache loader requires a FsdatasetCache}} instance as its 
parameter, we still instantiate it there as you noted. The 
{{getCacheLoaderClass}} comes from DNconf and it should not depend on 
{{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, 
it is not reasonable to catch ReflectiveOperationException and merely log it. 
We will throw a RuntimeException inside the catch block to terminate the start 
of DN.
{quote}There's quite a few other exceptions being swallowed or just logged when 
dubious "bad things" happen. Or discarding of exceptions and rethrowing generic 
"something went wrong" exceptions w/o the original exception as a cause. That 
complicates debugging.
{quote}
We are checking the code scrupulously and trying our best to fix the issues you 
mentioned to avoid complicating debugging. For loading pmem volume, we just log 
the error if one volume is not usable. We thought it is tolerable to have just 
one bad pmem volume configured by user. But an exception will be thrown to 
terminate the DN starting if all pmem volumes are invalid.
{quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not 
"ignorable". That's how bugs creep in which are insanely hard to debug. Don't 
check if offset is valid, just seek, let it throw if invalid. Don't ignore null 
filenames, just let it throw. Catching {{Throwable}} is a bad practice; let 
alone catching, discarding, and throwing a bland exception.
{quote}
Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, 
the invalid offset and null filenames 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800514#comment-16800514
 ] 

Feilong He edited comment on HDFS-14355 at 3/25/19 11:23 AM:
-

[~daryn], thanks so much for your pretty valuable comments.
{quote}I quickly skimmed the patch. At a high level, a plugin design should not 
leak details. This patch breaks the abstraction and litters the code with 
references to "pmem" and explicit conditionals. The details of pmem should be 
pushed down and hidden in the pmem specific classes.
{quote}
We tried to separate our impl for pmem cache with the current HDFS code. We 
introduced some classes  for pmem such as PmemMappableBlockLoader, 
PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some 
references for pmem, for example pmemUsedBytesCount, which may be one issue you 
pointed out implicitly. In our new patch, pmemUsedBytesCount and 
reserve/release methods for pmem will be removed from FsdatasetCache to a new 
class PmemVolumeManager. We are trying to shade such unnecessarily exposed 
details for pmem as you suggested.
{quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than 
desirable from a memory perspective. For those not using the feature, it's not 
nearly as bad as adding one to inodes but may be an issue for very dense DNs. 
More importantly, those that DO use the feature will incur a substantial memory 
hit to store the paths. Why does a SCM block need to track a completely 
different path? Isn't the SCM just another storage dir?
{quote}
The cacheFilePath in ReplicaInfo is just used for pmem cache. Since multiple 
pmems can be configured by user, to locate the cache file on pmem, it is 
necessary to keep such info for a cached block. Also, we seriously considered 
your reasonable concerns that adding cacheFilePath there may cause issues for 
very Dense DNs. Thanks for pointing out this. We will optimize this part with a 
new patch. In the new patch, we will remove cacheFilePath from ReplicaInfo. 
Instead, we will add a PmemVolumeManager for pmem cache to keep 
cacheFilePath(precisely, pmem volume index, which is enough for inferring the 
cache file path in our new impl). TB sized HDFS pmem cache should be able to 
cache around 10k blocks at most. In our evaluation, pmem cache would not 
consume substantial DRAM space to maintain which pmem volume a block is cached 
to for 10k level blocks. On the other hand, enabling pmem cache can alleviate 
the pressure of competing DRAM resource.
{quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the 
returned {{Class}} must be instantiated. It's wrong to compare the class simple 
name against {{MemoryMappableBlockLoader}}. Consider if a user configures with 
{{my.custom.MemoryMappableBlockLoader}} and it instantiates 
{{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}}
 anyway because of the matching simple name? Catching 
{{ReflectiveOperationException}} and just logging it masks a serious 
misconfiguration or error condition.
{quote}
The {{getCacheLoaderClaZZ}} will be changed to {{getCacheLoaderClass as you 
suggested. Your suggestion makes us aware of that comparing class simple name 
is inconsiderate. We will fix this issue in our new patch. Since the 
constructor of cache loader requires a FsdatasetCache}} instance as its 
parameter, we still instantiate it there as you noted. The 
{{getCacheLoaderClass}} comes from DNconf and it should not depend on 
{{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, 
it is not reasonable to catch ReflectiveOperationException and merely log it. 
We will throw a RuntimeException inside the catch block to terminate the start 
of DN.
{quote}There's quite a few other exceptions being swallowed or just logged when 
dubious "bad things" happen. Or discarding of exceptions and rethrowing generic 
"something went wrong" exceptions w/o the original exception as a cause. That 
complicates debugging.
{quote}
We are checking the code scrupulously and trying our best to fix the issues you 
mentioned to avoid complicating debugging. For loading pmem volume, we just log 
the error if one volume is not usable. We thought it is tolerable to have just 
one bad pmem volume configured by user. But an exception will be thrown to 
terminate the DN starting if all pmem volumes are invalid.
{quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not 
"ignorable". That's how bugs creep in which are insanely hard to debug. Don't 
check if offset is valid, just seek, let it throw if invalid. Don't ignore null 
filenames, just let it throw. Catching {{Throwable}} is a bad practice; let 
alone catching, discarding, and throwing a bland exception.
{quote}
Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, 
the invalid offset and null filenames 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-25 Thread Sammi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800445#comment-16800445
 ] 

Sammi Chen edited comment on HDFS-14355 at 3/25/19 7:21 AM:


Thanks [~PhiloHe] for working on it.  Several quick comments, 
 1. User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
 2. PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
 3. FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.

4. As [~daryn] suggested, more elegant error handling.


was (Author: sammi):
Thanks [~PhiloHe] for working on it.  Several quick comments, 
1. User would like to know the relationship between 
dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading 
the descriptions in hdfs-default.xml
2. PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount 
instead?  Also byte roundup is not addressed in PmemUsedBytesCount
3. FsDatasetCached is not a good place to put specific memory loader 
implemetation functions like reservePmem, releasePmem. FsDatasetCached should 
be generic.

4. As [~daryn] suggested, more elegant error handling

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-21 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798203#comment-16798203
 ] 

Feilong He edited comment on HDFS-14355 at 3/22/19 2:22 AM:


[~umamaheswararao], [~rakeshr], Thank you both for your valuable comments! I 
have updated our patch according to your suggestions except one.

Feedback to Uma's above comment:
 # Datanode will clean up all the data under the configured pmem volumes. We 
will document this clearly.
 # Indeed, #verifyIfValidPmemVolume can/should be in package scope. I have 
updated.
 # For pmem, the cache loader also serves as volume manager. It would be better 
to move outside. I tend to refactor it in the next subtask.
 # Please also refer to Rakesh's above comment. I agree with you both and have 
named java based impl as PmemMappableBlockLoader. For native pmdk based impl in 
next subtask, I will name it as NativePmemMappableBlockLoader. FileMappedBlock 
has also been changed to PmemMappedBlock.
 # I think using FileChannel#transferTo is a good idea.  Its JavaDoc says: 
{color:#629755}This method is potentially much more efficient than a simple 
loop{color}{color:#629755}* that reads from this channel and writes to the 
target channel.{color} We will conduct experiment on this option. If we got 
some good results, I will file a Jira and provide the patch.
 # As you pointed out, #afterCache just set the cache path after block replica 
is cached. It is evident that only pmem cache needs setting path for cached 
replica. So we use mappableBlock#afterCache to do that to avoid if-else check, 
which may be unavoidable if move replica#setCachePath into FSDataSetCache. Hope 
I have got your point.
 # As 6th item says.
 # We have a catch block to catch this exception and inside the catch block the 
IOException will be thrown again with specific exception message. Hope this is 
reasonable.

 # Agree with you. I have updated as you suggested.
 # Good insight. Currently, if a volume is full, the cache loader will not 
remove it. There are some potential improvements for volume management & 
selection strategy. I tend to optimize it in other JIRA.
 # I have updated as you suggested.
 # Good point. I have just refined that piece of code. Then, the exception will 
be thrown from #loadVolumes only if the count of valid pmem volumes is 0.
 # As I discussed with Uma offline, after considering the size of current pmem 
products, one pmem volume may not be able to cache 20k or more blocks. So in 
such level, we think it is OK to cache data into one dir as implemented 
currently.

Thanks again for so many valuable comments from [~umamaheswararao] & [~rakeshr].

Since the new patch has so many updates, I have uploaded it to this Jira. There 
still needs some refine work.

 


was (Author: philohe):
[~umamaheswararao], [~rakeshr], Thank you both for your valuable comments! I 
have updated our patch according to your suggestions except one.

Feedback to Uma's above comment:
 # Datanode will clean up all the data under the configured pmem volumes. We 
will document this clearly.
 # Indeed, #verifyIfValidPmemVolume can/should be in package scope. I have 
updated.
 # For pmem, the cache loader also serves as volume manager. It would be better 
to move outside. I tend to refactor it in the next subtask.
 # Please also refer to Rakesh's above comment. I agree with you both and have 
named java based impl as PmemMappableBlockLoader. For native pmdk based impl in 
next subtask, I will name it as NativePmemMappableBlockLoader. FileMappedBlock 
has also been changed to PmemMappedBlock.
 # I think using FileChannel#transferTo is a good idea.  Its JavaDoc says: 
{color:#629755}This method is potentially much more efficient than a simple loop
{color}{color:#629755}* that reads from this channel and writes to the target 
channel.{color} *We can have more discussions on it.* cc. [~rakeshr]
 # As you pointed out, #afterCache just set the cache path after block replica 
is cached. It is evident that only pmem cache needs setting path for cached 
replica. So we use mappableBlock#afterCache to do that to avoid if-else check, 
which may be unavoidable if move replica#setCachePath into FSDataSetCache. Hope 
I have got your point.
 # As 6th item says.
 # We have a catch block to catch this exception and inside the catch block the 
IOException will be thrown again with specific exception message. Hope this is 
reasonable.

 # Agree with you. I have updated as you suggested.
 # Good insight. Currently, if a volume is full, the cache loader will not 
remove it. There are some potential improvements for volume management & 
selection strategy. I tend to optimize it in other JIRA.
 # I have updated as you suggested.
 # Good point. I have just refined that piece of code. Then, the exception will 
be thrown from #loadVolumes only if the count of valid pmem volumes is 0.
 # As I discussed with 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-21 Thread Uma Maheswara Rao G (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797627#comment-16797627
 ] 

Uma Maheswara Rao G edited comment on HDFS-14355 at 3/21/19 7:53 AM:
-

[~PhiloHe] Thanks for working on this. Thanks [~rakeshr] for thorough reviews.

I have few more comments to address.
 # // Remove all files under the volume.
 FileUtils.cleanDirectory(locFile);
 is it safe to clean the existing files?
 # public static void verifyIfValidPmemVolume(File pmemDir)
 Why this is public? package scope does not work?
 # FileMappableBlockLoader: currently this implementation is handling more than 
block loading. Ex: volume management. Probably you may want to consider having 
PmemVolumeManager class which can manage all this volumes and refification etc? 
I am ok if you want to take up this in other JIRA. Similar/same volume 
management may be needed in HDFS-14356 as well. Considering that, it worth 
moving to common place. What do you say?
 # FileMappableBlockLoader: Actually this class implementing the block loading 
for pmem. So, should this name say 
PmemFileMappableBlockLoader/PmemMappableBlockLoader?
 HDFS-14356 impl may name that implementation class name as 
NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make 
sense ? cc [~rakeshr]
 # FileMappableBlockLoader : Currently you are reading data from input stream 
and verifying checksum and writing that buffer to MBB.
 One thought here is: how about using FileChannel#transferTo API for 
transferring data from one channel to other natively. and then do mmap on 
destination file(assuming mmap may be faster in target file) and do checksum 
verification on it? 
 I have not checked this here, just checking whether gives any further 
improvemeny as we are avoiding rading from HDD to user space.
 # FileMappedBlock: afterCache is basically getting block replica and setting 
the path. And this method used in FsDatasetCache. Why dont to expose required 
info from MappedBlock, that is key and filePath and call directly 
replica#setCachePath in FSDataSetCache?
 # ReplicaInfo replica = dataset.getBlockReplica(key.getBlockPoolId(),
      key.getBlockId());
      replica.setCachePath(filePath);
 # if (!result) \{  throw new IOException(); }Please add message here, possible 
reason for failure etc.

 # Config Description: "Currently, there are two cache loaders implemented:"
 I don't think you need to mention exact number of loaders (1/2/3), instead you 
can say Current available cacheLoaders are x, y. So you need not update this 
number when added new loader, just add that loader next to existing ones.
 # What happens if no space available on after volume filled with cached 
blocks? do you still choose that volume? do you need to update count and 
removed that volume for further writes?
 # Javadoc comment: "TODO: Refine location selection policy by considering 
storage utilization"
 Add this TODO at getOneLocation API as thats the relavent place. I still dont 
think this valume management should be done by Loaders.
 # private void loadVolumes(String[] volumes): even one volume bad you would 
throw exception? or you have plans to skip that volume? I am ok if you would 
like to consider in next JIRAs. DN skips the bad volumes and proceed IIRC.
 # filePath = getOneLocation() + "/" + key.getBlockPoolId() + "-" + 
key.getBlockId();
 This may be a problem when you cache more number of blocks? they all goes in 
same directory. DN creates subdirs in regular volumes.
 is there any issue if we maintain the same path structure in pmem volume? that 
way you can also avoid storing that cached path replica path?

Thanks [~PhiloHe] for the hard work you are doing on this.  


was (Author: umamaheswararao):
[~PhiloHe] Thanks for working on this. Thanks [~rakeshr] for thorough reviews.

I have few more comments to address.
 # // Remove all files under the volume.
 FileUtils.cleanDirectory(locFile);
 is it safe to clean the existing files?
 # public static void verifyIfValidPmemVolume(File pmemDir)
 Why this is public? package scope does not work?
 # FileMappableBlockLoader: currently this implementation is handling more than 
block loading. Ex: volume management. Probably you may want to consider having 
PmemVolumeManager class which can manage all this volumes and refification etc? 
I am ok if you want to take up this in other JIRA. Similar/same volume 
management may be needed in HDFS-14356 as well. Considering that, it worth 
moving to common place. What do you say?
 # FileMappableBlockLoader: Actually this class implementing the block loading 
for pmem. So, should this name say 
PmemFileMappableBlockLoader/PmemMappableBlockLoader? 
HDFS-14356 impl may name that implementation class name as 
NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make 
sense ? cc [~rakeshr]
 # FileMappableBlockLoader : Currently you are 

[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-18 Thread Feilong He (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794975#comment-16794975
 ] 

Feilong He edited comment on HDFS-14355 at 3/18/19 12:40 PM:
-

Thanks [~rakeshr] for your valuable comment. I got your point about the 
configuration for cache capacity of pmem. As synced with you, actually it is 
not reasonable to make pmem share such configuration with memory in the current 
implementation, since DataNode will also use this configuration to control 
memory usage for Lazy Persist Writes. I will update the patch for fixing this 
potential critical issue and other issues put forward in your other comments. 
Thanks so much!


was (Author: philohe):
Thanks [~rakeshr] for your valuable comment. I got your point about the 
configuration for cache capacity of pmem. As synced with you, actually it is 
not reasonable to make pmem share this configuration with memory in the current 
implementation, since DataNode will also use this configuration to control 
memory usage for Lazy Persist Writes. I will update the patch for fixing this 
potential critical issue and other issues put forward in your other comments. 
Thanks so much!

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-18 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794827#comment-16794827
 ] 

Rakesh R edited comment on HDFS-14355 at 3/18/19 8:31 AM:
--

Thanks [~PhiloHe] for the good progress. Adding second set of review comments, 
please go through it.
# Close {{file = new RandomAccessFile(filePath, "rw");}}
{code:java}

IOUtils.closeQuietly(file);
{code}
# Looks like unused code, please remove it.
{code:java}
  private FsDatasetImpl dataset;

  public MemoryMappableBlockLoader(FsDatasetImpl dataset) {
this.dataset = dataset;
  }
{code}
# FileMappableBlockLoader#loadVolumes exception handling. I feel this is not 
required, please remove it. If you still need this for some purpose, then 
please add message arg to {{IOException("Failed to parse persistent memory 
location " + location, e)}}
{code:java}
  } catch (IllegalArgumentException e) {
LOG.error("Failed to parse persistent memory location " + location +
" for " + e.getMessage());
throw new IOException(e);
  }
{code}
# Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add 
exception message arg to {{throw new IOException(t);}}
{code:java}
  throw new IOException(
  "Exception while writing data to persistent storage dir: " + pmemDir,
  t);
{code}
# Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the 
exception message.
{code:java}
  if (out == null) {
throw new IOException("Fail to map the block " + blockFileName
+ " to persistent storage.");
  }
{code}
# Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock
{code:java}
  throw new IOException(
  "checksum verification failed for the blockfile:" + blockFileName
  + ":  premature EOF");
{code}
# FileMappedBlock#afterCache. Suppressing exception may give wrong statistics, 
right? Assume, {{afterCache}} throws exception and not cached the file path. 
Here, the cached block won't be readable but unnecessarily consumes space. How 
about moving {{mappableBlock.afterCache();}} call right after 
{{mappableBlockLoader.load()}} function and add throws IOException clause to 
{{afterCache}} ?
{code:java}
  LOG.warn("Fail to find the replica file of PoolID = " +
  key.getBlockPoolId() + ", BlockID = " + key.getBlockId() +
  " for :" + e.getMessage());
{code}
# FsDatasetCache.java : reserve() and release() OS page size math is not 
required in FileMappedBlock. Appreciate if you could avoid these calls. Also, 
can you re-visit the caching and un-caching logic(for example, 
datanode.getMetrics() updates etc ) present in this class.
{code:java}
CachingTask#run(){

long newUsedBytes = reserve(length);
...
if (reservedBytes) {
   release(length);
}

UncachingTask#run() {
...
long newUsedBytes = release(value.mappableBlock.getLength());
{code}
# I have changed jira status and triggered QA. Please fix checkstyle warnings 
and test case failures. 
Also, can you uncomment {{Test//(timeout=12)}} two occurrences in the test.


was (Author: rakeshr):
Thanks [~PhiloHe] for the good progress. Adding second set of review comments, 
please go through it.
 # Close {{file = new RandomAccessFile(filePath, "rw");}}
{code:java}

IOUtils.closeQuietly(file);
{code}

 # Looks like unused code, please remove it.
{code:java}
  private FsDatasetImpl dataset;

  public MemoryMappableBlockLoader(FsDatasetImpl dataset) {
this.dataset = dataset;
  }
{code}

 # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not 
required, please remove it. If you still need this for some purpose, then 
please add message arg to {{IOException("Failed to parse persistent memory 
location " + location, e)}}
{code:java}
  } catch (IllegalArgumentException e) {
LOG.error("Failed to parse persistent memory location " + location +
" for " + e.getMessage());
throw new IOException(e);
  }
{code}

 # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add 
exception message arg to {{throw new IOException(t);}}
{code:java}
  throw new IOException(
  "Exception while writing data to persistent storage dir: " + pmemDir,
  t);
{code}

 # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the 
exception message.
{code:java}
  if (out == null) {
throw new IOException("Fail to map the block " + blockFileName
+ " to persistent storage.");
  }
{code}

 # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock
{code:java}
  throw new IOException(
  "checksum verification failed for the blockfile:" +