[jira] [Commented] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing

2021-02-10 Thread Anoop Sam John (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282894#comment-17282894
 ] 

Anoop Sam John commented on HDFS-15812:
---

Did u take any snpashots or backup on this deleted table?

> after deleting data of hbase table hdfs size is not decreasing
> --
>
> Key: HDFS-15812
> URL: https://issues.apache.org/jira/browse/HDFS-15812
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.0.2-alpha
> Environment: HDP 3.1.4.0-315
> Hbase 2.0.2.3.1.4.0-315
>Reporter: Satya Gaurav
>Priority: Major
>
> I am deleting the data from hbase table, it's deleting from hbase table but 
> the size of the hdfs directory is not reducing. Even I ran the major 
> compaction but after that also hdfs size didn't reduce. Any solution for this 
> issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2019-08-03 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899379#comment-16899379
 ] 

Anoop Sam John commented on HDFS-9668:
--

[~zhangchen] .. No work is happening around this patch.  So can u detail ur 
usage?  What different types of block devices under usage in the HSM? 

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
>Priority: Major
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-14.patch, HDFS-9668-15.patch, 
> HDFS-9668-16.patch, HDFS-9668-17.patch, HDFS-9668-18.patch, 
> HDFS-9668-19.patch, HDFS-9668-19.patch, HDFS-9668-2.patch, 
> HDFS-9668-20.patch, HDFS-9668-21.patch, HDFS-9668-22.patch, 
> HDFS-9668-23.patch, HDFS-9668-23.patch, HDFS-9668-24.patch, 
> HDFS-9668-25.patch, HDFS-9668-26.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage 

[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-10 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814365#comment-16814365
 ] 

Anoop Sam John commented on HDFS-14401:
---

PmemVolumeManager#reserve is synchronized  whereas release not !
chooseVolume  -> synchronized   Any reason why and not the counter variable is 
Atomic alone?
blockKeyToVolume.put(key, index);
What is the heap size requirement for this Map? Per entry?  Need some math 
which will be useful.
private final Long maxBytes;
Reason why not long but Long?
 static PmemVolumeManager getPmemVolumeManager() {
Can we avoid static getters?  Can we have a singleton model?



> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14401.000.patch
>
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14401) Refine the implementation for HDFS cache on SCM

2019-04-03 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809519#comment-16809519
 ] 

Anoop Sam John commented on HDFS-14401:
---

Now the new config "dfs.datanode.cache.loader.class"  is mandatory to use the 
pmem based cache. I know this was added because we have the Java based impl and 
another subtack is planning for a native impl (based on availability of PMDK 
lib)
Being a user I think it is an unwanted overhead. When we have a native impl and 
pure Java impl can HDFS select which loader to be used automatically?  If the 
native lib is available in a node, the native loader only can be used?   The 
native impl is added because it is much better performing for reads and writes 
from/to cache.  So when the node is ready for native loader and that is a 
better performing one, I am not sure why being a user I should try to use the 
less performing loader version.  

> Refine the implementation for HDFS cache on SCM
> ---
>
> Key: HDFS-14401
> URL: https://issues.apache.org/jira/browse/HDFS-14401
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>
> In this Jira, we will refine the implementation for HDFS cache on SCM, such 
> as: 1) Handle full pmem volume in VolumeManager; 2) Refine pmem volume 
> selection impl; 3) Clean up MapppableBlockLoader interface; etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-30 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806055#comment-16806055
 ] 

Anoop Sam John commented on HDFS-14355:
---

bq.“dfs.datanode.cache” is the prefix followed for all the cache related 
configs, so we would like to follow the pattern
Fine.. Ya I was also not sure whether some name pattern you were following. Ya 
am fine with the given reasoning
Thanks for addressing the comments. Looks good/

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch, HDFS-14355.006.patch, HDFS-14355.007.patch, 
> HDFS-14355.008.patch, HDFS-14355.009.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-30 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805884#comment-16805884
 ] 

Anoop Sam John commented on HDFS-14355:
---

getBlockInputStreamWithCheckingPmemCache  -> Can be private method

public PmemVolumeManager getPmemVolumeManager -> Why being exposed? For tests?  
If so can this be package private? And also mark it with @VisibleForTesting

I think the afterCache() thing is an unwanted indirection
{code}
FsDatasetCache
try {
411  mappableBlock = cacheLoader.load(length, blockIn, metaIn,
412  blockFileName, key);
413} catch (ChecksumException e) {
414  // Exception message is bogus since this wasn't caused by 
a file read

418  LOG.warn("Failed to cache the block [key=" + key + "]!", 
e);
419  return;
420}
421mappableBlock.afterCache();
PmemMappedBlock 
@Override
58 public void afterCache() {
59   pmemVolumeManager.afterCache(key, volumeIndex);
60 }
PmemVolumeManager 
public void afterCache(ExtendedBlockId key, Byte volumeIndex) {
299blockKeyToVolume.put(key, volumeIndex);
300  }
{code}
Actually in PmemMappableBlockLoader#load, once the load is successful 
(mappableBlock != null), we can do this pmemVolumeManager work right?

{code}
public void close() {
64   pmemVolumeManager.afterUncache(key);
...
68 FsDatasetUtil.deleteMappedFile(cacheFilePath);
{code}
Call afterUncache() after delete the file

public PmemVolumeManager(DNConf dnConf)
Can we only pass pmemVolumes and maxLockedPmem? That is cleaner IMO

getVolumeByIndex -> can this be package private

getCacheFilePath(ExtendedBlockId key) -> Better name would be 
getCachedPath(ExtendedBlockId)

dfs.datanode.cache.pmem.capacity -> Am not sure any naming convention u follow 
in HDFS. But as a user I would prefer a name dfs.datanode.pmem.cache.capacity. 
Ditto for dfs.datanode.cache.pmem.dirs


> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch, HDFS-14355.006.patch, HDFS-14355.007.patch, 
> HDFS-14355.008.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

2019-03-29 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805622#comment-16805622
 ] 

Anoop Sam John commented on HDFS-14355:
---

Pls give me a day Uma. Will have a look.. I was checking an older version patch 
and was half way. Seems new one came in as another subtask been committed.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch, HDFS-14355.004.patch, 
> HDFS-14355.005.patch, HDFS-14355.006.patch, HDFS-14355.007.patch, 
> HDFS-14355.008.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-02-25 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777582#comment-16777582
 ] 

Anoop Sam John commented on HDFS-3246:
--

bq.int read(long position, ByteBuffer buf)
When calling this API with buf remaining size of n and the file is having data 
size > n after given position, is it guaranteed to read the whole n bytes into 
BB in one go?  Just wanted to confirm. Thanks.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2017-12-11 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287093#comment-16287093
 ] 

Anoop Sam John commented on HDFS-10285:
---

As an HBase developer (HDFS user) I see SPS as not a new feature but an attempt 
to fix some of existing limitation/issues in HSM feature.  So as a user, IMHO, 
asking user for a new process for fixing the issue would be too much.  Again I 
can not say wrt the HDFS implementation or theory.  

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2017-08-09 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120227#comment-16120227
 ] 

Anoop Sam John commented on HDFS-10285:
---

HBase can get benefited from this feature. The scenario is as below
HBase allow the WAL files to be kept in low latency devices using the HSM 
feature.  (ALL_SSD/ ONE_SSD etc)  There is a directory for keeping all active 
WALs and we config the policy for that. After certain time, the WAL file will 
become inactive as all the data in that is eventually getting flushed into 
HFiles.  We will then archive it.  There is an archive directory and the 
archive op is done via a rename to a file under the archive dir.  Obviously the 
archive dir won't have any policy configured. By default we will keep the WAL 
files under archive dir for some more min and then delete them. If the WAL can 
get deleted it is fine even if the blocks of the WAL files continue to be in 
low latency device.  But there are some features and scenarios under which the 
deletion of WAL from archive can get delayed. Few eg:s
Cross cluster replication in place and the peer replica is slow/down.  HBase do 
inter cluster replication by reading the WAL. As long as the WAL cells are read 
and passed to other cluster, we can not delete
Backup feature in use and the backup refers to WAL files (Snapshot feature also)
Incremental backup is enabled.  Unless an incremental backup is taken, WALs in 
that time range can not be deleted.
Same for HFiles. After compaction, the compacted away files are archived and if 
they are referred by some active snapshots, we may not be able to delete them 
immediately. 
So it makes all sense to make use of this feature for moving the File blocks 
out of low latency devices so as to free space in it.
Once this feature is GA in a version and we can open up jira to make use of it.

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: HDFS-10285
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-SPS-TestReport-20170708.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7343) HDFS smart storage management

2017-03-24 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939992#comment-15939992
 ] 

Anoop Sam John commented on HDFS-7343:
--

Tracking of data hotness and movement is at what level?  Block level? Or only 
file level?
HBase, being a user, we will compact our HFiles into one (major compaction) and 
if the tracking is at file level, it might not be that useful for HBase.  Just 
saying

> HDFS smart storage management
> -
>
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kai Zheng
>Assignee: Wei Zhou
> Attachments: HDFSSmartStorageManagement-General-20170315.pdf, 
> HDFS-Smart-Storage-Management.pdf, 
> HDFSSmartStorageManagement-Phase1-20170315.pdf, 
> HDFS-Smart-Storage-Management-update.pdf, move.jpg
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and 
> flexible storage policy engine considering file attributes, metadata, data 
> temperature, storage type, EC codec, available hardware capabilities, 
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution 
> to provide smart storage management service in order for convenient, 
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache 
> facility, HSM offering, and all kinds of tools (balancer, mover, disk 
> balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9411) HDFS NodeLabel support

2017-03-08 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902571#comment-15902571
 ] 

Anoop Sam John commented on HDFS-9411:
--

Guys .. Whether this is still active?  We talked recently with one HDFS/HBase 
user and they also in need of a similar mechanism.  We were thinking of using 
the favored nodes feature..  But the block re replication wont consider/honor 
this favored nodes (?)
cc [~ram_krish], [~rakeshr]

> HDFS NodeLabel support
> --
>
> Key: HDFS-9411
> URL: https://issues.apache.org/jira/browse/HDFS-9411
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HDFSNodeLabels-15-09-2016.pdf, 
> HDFSNodeLabels-20-06-2016.pdf, HDFS_ZoneLabels-16112015.pdf
>
>
> HDFS currently stores data blocks on different datanodes chosen by 
> BlockPlacement Policy. These datanodes are random within the 
> scope(local-rack/different-rack/nodegroup) of network topology. 
> In Multi-tenant (Tenant can be user/service) scenario, blocks of any tenant 
> can be on any datanodes.
>  Based on applications of different tenant, sometimes datanode might get busy 
> making the other tenant's application to slow down. It would be better if 
> admin's have a provision to logically divide the cluster among multi-tenants.
> NodeLabels adds more options to user to specify constraints to select 
> specific nodes with specific requirements.
> High level design doc to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org