[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803869#comment-16803869 ] Feilong He edited comment on HDFS-14355 at 3/28/19 12:14 PM: - HDFS-14355.006.patch was uploaded with fixing [~rakeshr]'s comments. Suggestions are always welcome. Thanks! was (Author: philohe): HDFS-14355.006.patch was uploaded with fixing [~rakeshr]'s comments. Suggestions are always welcome! Thanks! > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, > HDFS-14355.005.patch, HDFS-14355.006.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803612#comment-16803612 ] Rakesh R edited comment on HDFS-14355 at 3/28/19 6:12 AM: -- Adding review comments, please take care. # How about adding an API to interface {{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It can return specific flag value to differentiate NVMe/DRAM based cache. {code:java} public boolean isPmemCacheEnabled() { return mappableBlockLoader instanceof PmemMappableBlockLoader; } {code} # I'd like to avoid type casting. It won't work with another Pmem implementation, right? {code:java} public String getReplicaCachePath(String bpid, long blockId) { if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) { return null; } ExtendedBlockId key = new ExtendedBlockId(blockId, bpid); String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager() .getCachedFilePath(key); return cachePath; } {code} # Below type casting can be replaced with HDFS-14393 interface. {code:java} /** * Get cache capacity of persistent memory. * TODO: advertise this metric to NameNode by FSDatasetMBean */ public long getPmemCacheCapacity() { if (isPmemCacheEnabled()) { return ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager().getPmemCacheCapacity(); } return 0; } public long getPmemCacheUsed() { if (isPmemCacheEnabled()) { return ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager().getPmemCacheUsed(); } return 0; } {code} # {{FsDatasetUtil#deleteMappedFile}} - try-catch is not required, can we do like, {code:java} public static void deleteMappedFile(String filePath) throws IOException { boolean result = Files.deleteIfExists(Paths.get(filePath)); if (!result) { throw new IOException("Failed to delete the mapped file: " + filePath); } } {code} # Why cant't we avoid {{LocalReplica}} changes and read directly from Util like below, {code:java} FsDatasetImpl#getBlockInputStreamWithCheckingPmemCache() . if (cachePath != null) { return FsDatasetUtil.getInputStreamAndSeek(new File(cachePath), seekOffset); } {code} # As the class {{PmemVolumeManager}} itself represents {{Pmem}} so its good to remove this extra keyword from the methods and entities from this class - PmemUsedBytesCount, getPmemCacheUsed, getPmemCacheCapacity etc.. # Please avoid unchecked conversion and we can do like, {code:java} PmemVolumeManager.java private final Map blockKeyToVolume = new ConcurrentHashMap<>(); Map getBlockKeyToVolume() { return blockKeyToVolume; } {code} # Add exception message in PmemVolumeManager#verifyIfValidPmemVolume {code:java} if (out == null) { throw new IOException(); } {code} # Here {{IOException}} clause is not required, please remove it. We can add it later, if needed. {code:java} MappableBlock.java void afterCache() throws IOException; FsDatasetCache.java try { mappableBlock.afterCache(); } catch (IOException e) { LOG.warn(e.getMessage()); return; } {code} # Can we include block id into the log message, that would improve debugging. {code:java} LOG.info("Successfully cache one replica into persistent memory: " + "[path=" + filePath + ", length=" + length + "]"); to LOG.info("Successfully cached one replica:{} into persistent memory" + ", [cached path={}, length={}]", key, filePath, length); {code} was (Author: rakeshr): Adding review comments, please take care. # How about adding an API to interface {{MappableBlockLoader#isTransientCache()}} to avoid checks specific to PMem. It can return specific flag value to differentiate NVMe/DRAM based cache. {code:java} public boolean isPmemCacheEnabled() { return mappableBlockLoader instanceof PmemMappableBlockLoader; } {code} # I'd like to avoid type casting. It won't work with another Pmem implementation, right? {code:java} public String getReplicaCachePath(String bpid, long blockId) { if (!isPmemCacheEnabled() || !isCached(bpid, blockId)) { return null; } ExtendedBlockId key = new ExtendedBlockId(blockId, bpid); String cachePath = ((PmemMappableBlockLoader)mappableBlockLoader) .getPmemVolumeManager() .getCachedFilePath(key); return cachePath; } {code} # Below type casting can be replaced with HDFS-14393 interface. {code:java} /** * Get cache capacity of persistent memory. * TODO: advertise this metric to NameNode by FSDatasetMBean */ public long getPmemCacheCapacity() { if (isPmemCacheEnabled()) { return
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800514#comment-16800514 ] Feilong He edited comment on HDFS-14355 at 3/26/19 8:18 AM: [~daryn], thanks so much for your pretty valuable comments. {quote}I quickly skimmed the patch. At a high level, a plugin design should not leak details. This patch breaks the abstraction and litters the code with references to "pmem" and explicit conditionals. The details of pmem should be pushed down and hidden in the pmem specific classes. {quote} We tried to separate our impl for pmem cache with the current HDFS code. Some classes for pmem were introduced, such as PmemMappableBlockLoader, PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some references for pmem, for example pmemUsedBytesCount, which may be one issue you pointed out implicitly. In our new patch, pmemUsedBytesCount and reserve/release methods for pmem will be removed from FsdatasetCache to a new class PmemVolumeManager. We are trying to shade such unnecessarily exposed details for pmem as you suggested. {quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than desirable from a memory perspective. For those not using the feature, it's not nearly as bad as adding one to inodes but may be an issue for very dense DNs. More importantly, those that DO use the feature will incur a substantial memory hit to store the paths. Why does a SCM block need to track a completely different path? Isn't the SCM just another storage dir? {quote} We seriously considered your reasonable concerns that adding cacheFilePath in ReplicaInfo there may cause issues for very Dense DNs. Thanks for pointing out this. We will optimize this part with a new patch. In the new patch, we will remove cacheFilePath from ReplicaInfo. Instead, we will add a PmemVolumeManager for pmem cache to keep cacheFilePath(precisely, pmem volume index, which is enough for inferring the cache file path in our new impl). TB sized HDFS pmem cache should be able to cache around 10k blocks at most. For our new impl, we evaluated that pmem cache would not consume substantial DRAM space to maintain a map for 10k level cached blocks. On the other hand, enabling pmem cache can alleviate the pressure of competing for DRAM resource. {quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the returned {{Class}} must be instantiated. It's wrong to compare the class simple name against {{MemoryMappableBlockLoader}}. Consider if a user configures with {{my.custom.MemoryMappableBlockLoader}} and it instantiates {{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}} anyway because of the matching simple name? Catching {{ReflectiveOperationException}} and just logging it masks a serious misconfiguration or error condition. {quote} The {{getCacheLoaderClaZZ}} will be changed to {{getCacheLoaderClass as you suggested. Your suggestion makes us aware of that comparing class simple name is inconsiderate. We will fix this issue in our new patch. Since the constructor of cache loader requires a FsdatasetCache}} instance as its parameter, we still instantiate it there as you noted. The {{getCacheLoaderClass}} comes from DNconf and it should not depend on {{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, it is not reasonable to catch ReflectiveOperationException and merely log it. We will throw a RuntimeException inside the catch block to terminate the start of DN. {quote}There's quite a few other exceptions being swallowed or just logged when dubious "bad things" happen. Or discarding of exceptions and rethrowing generic "something went wrong" exceptions w/o the original exception as a cause. That complicates debugging. {quote} We are checking the code scrupulously and trying our best to fix the issues you mentioned to avoid complicating debugging. For loading pmem volume, we just log the error if one volume is not usable. We thought it is tolerable to have just one bad pmem volume configured by user. But an exception will be thrown to terminate the DN starting if all pmem volumes are invalid. {quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not "ignorable". That's how bugs creep in which are insanely hard to debug. Don't check if offset is valid, just seek, let it throw if invalid. Don't ignore null filenames, just let it throw. Catching {{Throwable}} is a bad practice; let alone catching, discarding, and throwing a bland exception. {quote} Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, the invalid offset and null filenames shouldn't be ignored. We are also checking and fixing such issue in other piece of code. {quote}Why throw a {{RuntimeException}} in "getOneLocation"? Is the intent to take down the DN? {quote}
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592 ] Feilong He edited comment on HDFS-14355 at 3/26/19 1:57 AM: Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, DRAM cache & pmem cache have different reserve/release methods, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new patch, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. was (Author: philohe): Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, DRAM cache & pmem cache have different reserve/release methods, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592 ] Feilong He edited comment on HDFS-14355 at 3/25/19 12:51 PM: - Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, DRAM cache & pmem cache have different reserve/release methods, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. was (Author: philohe): Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount }}to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce PmemUsedBytesCount for pmem to separate pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount have different reserve/release method, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592 ] Feilong He edited comment on HDFS-14355 at 3/25/19 12:51 PM: - Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, DRAM cache & pmem cache have different reserve/release methods, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. was (Author: philohe): Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this UsedBytesCount to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduced PmemUsedBytesCount for pmem to separate pmem's cache bytes management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, DRAM cache & pmem cache have different reserve/release methods, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592 ] Feilong He edited comment on HDFS-14355 at 3/25/19 12:48 PM: - Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount }}to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce PmemUsedBytesCount for pmem to separate pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount have different reserve/release method, which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. was (Author: philohe): Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, {{UsedBytesCount is used to count the DRAM bytes. }} It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount}}to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce PmemUsedBytesCount for pmem to separate pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount have different reserve/release method which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592 ] Feilong He edited comment on HDFS-14355 at 3/25/19 12:45 PM: - Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, {{UsedBytesCount is used to count the DRAM bytes. }} It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes ({{dfs.datanode.max.locked.memory}}). We found that besides HDFS DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount}}to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce PmemUsedBytesCount for pmem to separate pmem's cache management with DRAM's. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount have different reserve/release method which also makes adding PmemUsedBytesCount necessary. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. was (Author: philohe): Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, {{UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory) . We found that besides DRAM cache, Lazy Persist Writes also uses this UsedBytesCount}} to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce PmemUsedBytesCount for pmem. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount has different reserve/release method. So we just introduced PmemUsedBytesCount. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800592#comment-16800592 ] Feilong He edited comment on HDFS-14355 at 3/25/19 11:37 AM: - Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, {{UsedBytesCount is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory) . We found that besides DRAM cache, Lazy Persist Writes also uses this UsedBytesCount}} to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce PmemUsedBytesCount for pmem. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and PmemUsedBytesCount has different reserve/release method. So we just introduced PmemUsedBytesCount. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move PmemUsedBytesCount, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. was (Author: philohe): Thanks [~Sammi] for your valuable comments. {quote}User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml {quote} I will proofread the description to make them clear to user. {quote}PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount {quote} As you know, {{UsedBytesCount }}is used to count the DRAM bytes. It can ensure that after reserving bytes for DRAM cache the used bytes will not exceed maxBytes (dfs.datanode.max.locked.memory) . We found that besides DRAM cache, Lazy Persist Writes also uses this {{UsedBytesCount}} to reserve/release bytes. Since supporting Lazy Persist Writes on pmem is not the target of this jira, we introduce Pmem{{UsedBytesCount}} for pmem. Thus Lazy Persist Writes will not be affected. User can still enable Lazy Persist Writes by configuring dfs.datanode.max.locked.memory. Pmem may not have page size like mechanism as DRAM (we will confirm it). So we didn't round up the bytes to a page size like value. Because of this difference, {{UsedBytesCount}} and Pmem{{UsedBytesCount}} has different reserve/release method. So we just introduced Pmem{{UsedBytesCount}}. {quote}FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. {quote} Good suggestion. We are aware of this issue as you pointed out. In the new path, we will move Pmem{{UsedBytesCount}}, reservePmem, releasePmem to a new class PmemVolumeManager to keep FsDatasetCache generic. {quote}As [~daryn] suggested, more elegant error handling. {quote} We are checking our code to make exceptions be handled elegantly. Thanks again for your huge efforts on reviewing this patch. Your suggestions will be seriously considered by us. > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800514#comment-16800514 ] Feilong He edited comment on HDFS-14355 at 3/25/19 11:25 AM: - [~daryn], thanks so much for your pretty valuable comments. {quote}I quickly skimmed the patch. At a high level, a plugin design should not leak details. This patch breaks the abstraction and litters the code with references to "pmem" and explicit conditionals. The details of pmem should be pushed down and hidden in the pmem specific classes. {quote} We tried to separate our impl for pmem cache with the current HDFS code. Some classes for pmem were introduced, such as PmemMappableBlockLoader, PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some references for pmem, for example pmemUsedBytesCount, which may be one issue you pointed out implicitly. In our new patch, pmemUsedBytesCount and reserve/release methods for pmem will be removed from FsdatasetCache to a new class PmemVolumeManager. We are trying to shade such unnecessarily exposed details for pmem as you suggested. {quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than desirable from a memory perspective. For those not using the feature, it's not nearly as bad as adding one to inodes but may be an issue for very dense DNs. More importantly, those that DO use the feature will incur a substantial memory hit to store the paths. Why does a SCM block need to track a completely different path? Isn't the SCM just another storage dir? {quote} The cacheFilePath in ReplicaInfo is just used for pmem cache. Since multiple pmems can be configured by user, to locate the cache file on pmem, it is necessary to keep such info for a cached block. Also, we seriously considered your reasonable concerns that adding cacheFilePath there may cause issues for very Dense DNs. Thanks for pointing out this. We will optimize this part with a new patch. In the new patch, we will remove cacheFilePath from ReplicaInfo. Instead, we will add a PmemVolumeManager for pmem cache to keep cacheFilePath(precisely, pmem volume index, which is enough for inferring the cache file path in our new impl). TB sized HDFS pmem cache should be able to cache around 10k blocks at most. In our evaluation, pmem cache would not consume substantial DRAM space to maintain which pmem volume a block is cached to for 10k level blocks. On the other hand, enabling pmem cache can alleviate the pressure of competing DRAM resource. {quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the returned {{Class}} must be instantiated. It's wrong to compare the class simple name against {{MemoryMappableBlockLoader}}. Consider if a user configures with {{my.custom.MemoryMappableBlockLoader}} and it instantiates {{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}} anyway because of the matching simple name? Catching {{ReflectiveOperationException}} and just logging it masks a serious misconfiguration or error condition. {quote} The {{getCacheLoaderClaZZ}} will be changed to {{getCacheLoaderClass as you suggested. Your suggestion makes us aware of that comparing class simple name is inconsiderate. We will fix this issue in our new patch. Since the constructor of cache loader requires a FsdatasetCache}} instance as its parameter, we still instantiate it there as you noted. The {{getCacheLoaderClass}} comes from DNconf and it should not depend on {{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, it is not reasonable to catch ReflectiveOperationException and merely log it. We will throw a RuntimeException inside the catch block to terminate the start of DN. {quote}There's quite a few other exceptions being swallowed or just logged when dubious "bad things" happen. Or discarding of exceptions and rethrowing generic "something went wrong" exceptions w/o the original exception as a cause. That complicates debugging. {quote} We are checking the code scrupulously and trying our best to fix the issues you mentioned to avoid complicating debugging. For loading pmem volume, we just log the error if one volume is not usable. We thought it is tolerable to have just one bad pmem volume configured by user. But an exception will be thrown to terminate the DN starting if all pmem volumes are invalid. {quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not "ignorable". That's how bugs creep in which are insanely hard to debug. Don't check if offset is valid, just seek, let it throw if invalid. Don't ignore null filenames, just let it throw. Catching {{Throwable}} is a bad practice; let alone catching, discarding, and throwing a bland exception. {quote} Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, the invalid offset and null filenames
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800514#comment-16800514 ] Feilong He edited comment on HDFS-14355 at 3/25/19 11:23 AM: - [~daryn], thanks so much for your pretty valuable comments. {quote}I quickly skimmed the patch. At a high level, a plugin design should not leak details. This patch breaks the abstraction and litters the code with references to "pmem" and explicit conditionals. The details of pmem should be pushed down and hidden in the pmem specific classes. {quote} We tried to separate our impl for pmem cache with the current HDFS code. We introduced some classes for pmem such as PmemMappableBlockLoader, PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some references for pmem, for example pmemUsedBytesCount, which may be one issue you pointed out implicitly. In our new patch, pmemUsedBytesCount and reserve/release methods for pmem will be removed from FsdatasetCache to a new class PmemVolumeManager. We are trying to shade such unnecessarily exposed details for pmem as you suggested. {quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than desirable from a memory perspective. For those not using the feature, it's not nearly as bad as adding one to inodes but may be an issue for very dense DNs. More importantly, those that DO use the feature will incur a substantial memory hit to store the paths. Why does a SCM block need to track a completely different path? Isn't the SCM just another storage dir? {quote} The cacheFilePath in ReplicaInfo is just used for pmem cache. Since multiple pmems can be configured by user, to locate the cache file on pmem, it is necessary to keep such info for a cached block. Also, we seriously considered your reasonable concerns that adding cacheFilePath there may cause issues for very Dense DNs. Thanks for pointing out this. We will optimize this part with a new patch. In the new patch, we will remove cacheFilePath from ReplicaInfo. Instead, we will add a PmemVolumeManager for pmem cache to keep cacheFilePath(precisely, pmem volume index, which is enough for inferring the cache file path in our new impl). TB sized HDFS pmem cache should be able to cache around 10k blocks at most. In our evaluation, pmem cache would not consume substantial DRAM space to maintain which pmem volume a block is cached to for 10k level blocks. On the other hand, enabling pmem cache can alleviate the pressure of competing DRAM resource. {quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the returned {{Class}} must be instantiated. It's wrong to compare the class simple name against {{MemoryMappableBlockLoader}}. Consider if a user configures with {{my.custom.MemoryMappableBlockLoader}} and it instantiates {{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}} anyway because of the matching simple name? Catching {{ReflectiveOperationException}} and just logging it masks a serious misconfiguration or error condition. {quote} The {{getCacheLoaderClaZZ}} will be changed to {{getCacheLoaderClass as you suggested. Your suggestion makes us aware of that comparing class simple name is inconsiderate. We will fix this issue in our new patch. Since the constructor of cache loader requires a FsdatasetCache}} instance as its parameter, we still instantiate it there as you noted. The {{getCacheLoaderClass}} comes from DNconf and it should not depend on {{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, it is not reasonable to catch ReflectiveOperationException and merely log it. We will throw a RuntimeException inside the catch block to terminate the start of DN. {quote}There's quite a few other exceptions being swallowed or just logged when dubious "bad things" happen. Or discarding of exceptions and rethrowing generic "something went wrong" exceptions w/o the original exception as a cause. That complicates debugging. {quote} We are checking the code scrupulously and trying our best to fix the issues you mentioned to avoid complicating debugging. For loading pmem volume, we just log the error if one volume is not usable. We thought it is tolerable to have just one bad pmem volume configured by user. But an exception will be thrown to terminate the DN starting if all pmem volumes are invalid. {quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not "ignorable". That's how bugs creep in which are insanely hard to debug. Don't check if offset is valid, just seek, let it throw if invalid. Don't ignore null filenames, just let it throw. Catching {{Throwable}} is a bad practice; let alone catching, discarding, and throwing a bland exception. {quote} Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, the invalid offset and null filenames
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800445#comment-16800445 ] Sammi Chen edited comment on HDFS-14355 at 3/25/19 7:21 AM: Thanks [~PhiloHe] for working on it. Several quick comments, 1. User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml 2. PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount 3. FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. 4. As [~daryn] suggested, more elegant error handling. was (Author: sammi): Thanks [~PhiloHe] for working on it. Several quick comments, 1. User would like to know the relationship between dfs.datanode.cache.pmem.capacity and dfs.datanode.max.locked.memory by reading the descriptions in hdfs-default.xml 2. PmemUsedBytesCount, is there any forsee issue to reuse UsedBytesCount instead? Also byte roundup is not addressed in PmemUsedBytesCount 3. FsDatasetCached is not a good place to put specific memory loader implemetation functions like reservePmem, releasePmem. FsDatasetCached should be generic. 4. As [~daryn] suggested, more elegant error handling > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch, HDFS-14355.003.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798203#comment-16798203 ] Feilong He edited comment on HDFS-14355 at 3/22/19 2:22 AM: [~umamaheswararao], [~rakeshr], Thank you both for your valuable comments! I have updated our patch according to your suggestions except one. Feedback to Uma's above comment: # Datanode will clean up all the data under the configured pmem volumes. We will document this clearly. # Indeed, #verifyIfValidPmemVolume can/should be in package scope. I have updated. # For pmem, the cache loader also serves as volume manager. It would be better to move outside. I tend to refactor it in the next subtask. # Please also refer to Rakesh's above comment. I agree with you both and have named java based impl as PmemMappableBlockLoader. For native pmdk based impl in next subtask, I will name it as NativePmemMappableBlockLoader. FileMappedBlock has also been changed to PmemMappedBlock. # I think using FileChannel#transferTo is a good idea. Its JavaDoc says: {color:#629755}This method is potentially much more efficient than a simple loop{color}{color:#629755}* that reads from this channel and writes to the target channel.{color} We will conduct experiment on this option. If we got some good results, I will file a Jira and provide the patch. # As you pointed out, #afterCache just set the cache path after block replica is cached. It is evident that only pmem cache needs setting path for cached replica. So we use mappableBlock#afterCache to do that to avoid if-else check, which may be unavoidable if move replica#setCachePath into FSDataSetCache. Hope I have got your point. # As 6th item says. # We have a catch block to catch this exception and inside the catch block the IOException will be thrown again with specific exception message. Hope this is reasonable. # Agree with you. I have updated as you suggested. # Good insight. Currently, if a volume is full, the cache loader will not remove it. There are some potential improvements for volume management & selection strategy. I tend to optimize it in other JIRA. # I have updated as you suggested. # Good point. I have just refined that piece of code. Then, the exception will be thrown from #loadVolumes only if the count of valid pmem volumes is 0. # As I discussed with Uma offline, after considering the size of current pmem products, one pmem volume may not be able to cache 20k or more blocks. So in such level, we think it is OK to cache data into one dir as implemented currently. Thanks again for so many valuable comments from [~umamaheswararao] & [~rakeshr]. Since the new patch has so many updates, I have uploaded it to this Jira. There still needs some refine work. was (Author: philohe): [~umamaheswararao], [~rakeshr], Thank you both for your valuable comments! I have updated our patch according to your suggestions except one. Feedback to Uma's above comment: # Datanode will clean up all the data under the configured pmem volumes. We will document this clearly. # Indeed, #verifyIfValidPmemVolume can/should be in package scope. I have updated. # For pmem, the cache loader also serves as volume manager. It would be better to move outside. I tend to refactor it in the next subtask. # Please also refer to Rakesh's above comment. I agree with you both and have named java based impl as PmemMappableBlockLoader. For native pmdk based impl in next subtask, I will name it as NativePmemMappableBlockLoader. FileMappedBlock has also been changed to PmemMappedBlock. # I think using FileChannel#transferTo is a good idea. Its JavaDoc says: {color:#629755}This method is potentially much more efficient than a simple loop {color}{color:#629755}* that reads from this channel and writes to the target channel.{color} *We can have more discussions on it.* cc. [~rakeshr] # As you pointed out, #afterCache just set the cache path after block replica is cached. It is evident that only pmem cache needs setting path for cached replica. So we use mappableBlock#afterCache to do that to avoid if-else check, which may be unavoidable if move replica#setCachePath into FSDataSetCache. Hope I have got your point. # As 6th item says. # We have a catch block to catch this exception and inside the catch block the IOException will be thrown again with specific exception message. Hope this is reasonable. # Agree with you. I have updated as you suggested. # Good insight. Currently, if a volume is full, the cache loader will not remove it. There are some potential improvements for volume management & selection strategy. I tend to optimize it in other JIRA. # I have updated as you suggested. # Good point. I have just refined that piece of code. Then, the exception will be thrown from #loadVolumes only if the count of valid pmem volumes is 0. # As I discussed with
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797627#comment-16797627 ] Uma Maheswara Rao G edited comment on HDFS-14355 at 3/21/19 7:53 AM: - [~PhiloHe] Thanks for working on this. Thanks [~rakeshr] for thorough reviews. I have few more comments to address. # // Remove all files under the volume. FileUtils.cleanDirectory(locFile); is it safe to clean the existing files? # public static void verifyIfValidPmemVolume(File pmemDir) Why this is public? package scope does not work? # FileMappableBlockLoader: currently this implementation is handling more than block loading. Ex: volume management. Probably you may want to consider having PmemVolumeManager class which can manage all this volumes and refification etc? I am ok if you want to take up this in other JIRA. Similar/same volume management may be needed in HDFS-14356 as well. Considering that, it worth moving to common place. What do you say? # FileMappableBlockLoader: Actually this class implementing the block loading for pmem. So, should this name say PmemFileMappableBlockLoader/PmemMappableBlockLoader? HDFS-14356 impl may name that implementation class name as NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make sense ? cc [~rakeshr] # FileMappableBlockLoader : Currently you are reading data from input stream and verifying checksum and writing that buffer to MBB. One thought here is: how about using FileChannel#transferTo API for transferring data from one channel to other natively. and then do mmap on destination file(assuming mmap may be faster in target file) and do checksum verification on it? I have not checked this here, just checking whether gives any further improvemeny as we are avoiding rading from HDD to user space. # FileMappedBlock: afterCache is basically getting block replica and setting the path. And this method used in FsDatasetCache. Why dont to expose required info from MappedBlock, that is key and filePath and call directly replica#setCachePath in FSDataSetCache? # ReplicaInfo replica = dataset.getBlockReplica(key.getBlockPoolId(), key.getBlockId()); replica.setCachePath(filePath); # if (!result) \{ throw new IOException(); }Please add message here, possible reason for failure etc. # Config Description: "Currently, there are two cache loaders implemented:" I don't think you need to mention exact number of loaders (1/2/3), instead you can say Current available cacheLoaders are x, y. So you need not update this number when added new loader, just add that loader next to existing ones. # What happens if no space available on after volume filled with cached blocks? do you still choose that volume? do you need to update count and removed that volume for further writes? # Javadoc comment: "TODO: Refine location selection policy by considering storage utilization" Add this TODO at getOneLocation API as thats the relavent place. I still dont think this valume management should be done by Loaders. # private void loadVolumes(String[] volumes): even one volume bad you would throw exception? or you have plans to skip that volume? I am ok if you would like to consider in next JIRAs. DN skips the bad volumes and proceed IIRC. # filePath = getOneLocation() + "/" + key.getBlockPoolId() + "-" + key.getBlockId(); This may be a problem when you cache more number of blocks? they all goes in same directory. DN creates subdirs in regular volumes. is there any issue if we maintain the same path structure in pmem volume? that way you can also avoid storing that cached path replica path? Thanks [~PhiloHe] for the hard work you are doing on this. was (Author: umamaheswararao): [~PhiloHe] Thanks for working on this. Thanks [~rakeshr] for thorough reviews. I have few more comments to address. # // Remove all files under the volume. FileUtils.cleanDirectory(locFile); is it safe to clean the existing files? # public static void verifyIfValidPmemVolume(File pmemDir) Why this is public? package scope does not work? # FileMappableBlockLoader: currently this implementation is handling more than block loading. Ex: volume management. Probably you may want to consider having PmemVolumeManager class which can manage all this volumes and refification etc? I am ok if you want to take up this in other JIRA. Similar/same volume management may be needed in HDFS-14356 as well. Considering that, it worth moving to common place. What do you say? # FileMappableBlockLoader: Actually this class implementing the block loading for pmem. So, should this name say PmemFileMappableBlockLoader/PmemMappableBlockLoader? HDFS-14356 impl may name that implementation class name as NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make sense ? cc [~rakeshr] # FileMappableBlockLoader : Currently you are
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794975#comment-16794975 ] Feilong He edited comment on HDFS-14355 at 3/18/19 12:40 PM: - Thanks [~rakeshr] for your valuable comment. I got your point about the configuration for cache capacity of pmem. As synced with you, actually it is not reasonable to make pmem share such configuration with memory in the current implementation, since DataNode will also use this configuration to control memory usage for Lazy Persist Writes. I will update the patch for fixing this potential critical issue and other issues put forward in your other comments. Thanks so much! was (Author: philohe): Thanks [~rakeshr] for your valuable comment. I got your point about the configuration for cache capacity of pmem. As synced with you, actually it is not reasonable to make pmem share this configuration with memory in the current implementation, since DataNode will also use this configuration to control memory usage for Lazy Persist Writes. I will update the patch for fixing this potential critical issue and other issues put forward in your other comments. Thanks so much! > Implement HDFS cache on SCM by using pure java mapped byte buffer > - > > Key: HDFS-14355 > URL: https://issues.apache.org/jira/browse/HDFS-14355 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, > HDFS-14355.002.patch > > > This task is to implement the caching to persistent memory using pure > {{java.nio.MappedByteBuffer}}, which could be useful in case native support > isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer
[ https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794827#comment-16794827 ] Rakesh R edited comment on HDFS-14355 at 3/18/19 8:31 AM: -- Thanks [~PhiloHe] for the good progress. Adding second set of review comments, please go through it. # Close {{file = new RandomAccessFile(filePath, "rw");}} {code:java} IOUtils.closeQuietly(file); {code} # Looks like unused code, please remove it. {code:java} private FsDatasetImpl dataset; public MemoryMappableBlockLoader(FsDatasetImpl dataset) { this.dataset = dataset; } {code} # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not required, please remove it. If you still need this for some purpose, then please add message arg to {{IOException("Failed to parse persistent memory location " + location, e)}} {code:java} } catch (IllegalArgumentException e) { LOG.error("Failed to parse persistent memory location " + location + " for " + e.getMessage()); throw new IOException(e); } {code} # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add exception message arg to {{throw new IOException(t);}} {code:java} throw new IOException( "Exception while writing data to persistent storage dir: " + pmemDir, t); {code} # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the exception message. {code:java} if (out == null) { throw new IOException("Fail to map the block " + blockFileName + " to persistent storage."); } {code} # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock {code:java} throw new IOException( "checksum verification failed for the blockfile:" + blockFileName + ": premature EOF"); {code} # FileMappedBlock#afterCache. Suppressing exception may give wrong statistics, right? Assume, {{afterCache}} throws exception and not cached the file path. Here, the cached block won't be readable but unnecessarily consumes space. How about moving {{mappableBlock.afterCache();}} call right after {{mappableBlockLoader.load()}} function and add throws IOException clause to {{afterCache}} ? {code:java} LOG.warn("Fail to find the replica file of PoolID = " + key.getBlockPoolId() + ", BlockID = " + key.getBlockId() + " for :" + e.getMessage()); {code} # FsDatasetCache.java : reserve() and release() OS page size math is not required in FileMappedBlock. Appreciate if you could avoid these calls. Also, can you re-visit the caching and un-caching logic(for example, datanode.getMetrics() updates etc ) present in this class. {code:java} CachingTask#run(){ long newUsedBytes = reserve(length); ... if (reservedBytes) { release(length); } UncachingTask#run() { ... long newUsedBytes = release(value.mappableBlock.getLength()); {code} # I have changed jira status and triggered QA. Please fix checkstyle warnings and test case failures. Also, can you uncomment {{Test//(timeout=12)}} two occurrences in the test. was (Author: rakeshr): Thanks [~PhiloHe] for the good progress. Adding second set of review comments, please go through it. # Close {{file = new RandomAccessFile(filePath, "rw");}} {code:java} IOUtils.closeQuietly(file); {code} # Looks like unused code, please remove it. {code:java} private FsDatasetImpl dataset; public MemoryMappableBlockLoader(FsDatasetImpl dataset) { this.dataset = dataset; } {code} # FileMappableBlockLoader#loadVolumes exception handling. I feel this is not required, please remove it. If you still need this for some purpose, then please add message arg to {{IOException("Failed to parse persistent memory location " + location, e)}} {code:java} } catch (IllegalArgumentException e) { LOG.error("Failed to parse persistent memory location " + location + " for " + e.getMessage()); throw new IOException(e); } {code} # Debuggability: FileMappableBlockLoader#verifyIfValidPmemVolume. Here, add exception message arg to {{throw new IOException(t);}} {code:java} throw new IOException( "Exception while writing data to persistent storage dir: " + pmemDir, t); {code} # Debuggability: FileMappableBlockLoader#load. Here, add blockFileName to the exception message. {code:java} if (out == null) { throw new IOException("Fail to map the block " + blockFileName + " to persistent storage."); } {code} # Debuggability: FileMappableBlockLoader#verifyChecksumAndMapBlock {code:java} throw new IOException( "checksum verification failed for the blockfile:" +