APIs to move data blocks within HDFS

2013-02-22 Thread Karthiek C
Hi, Is there any APIs to move data blocks in HDFS from one node to another * after* they have been added to HDFS? Also can we write some sort of pluggable module (like scheduler) that controls how data gets placed in hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find any

Re: APIs to move data blocks within HDFS

2013-02-22 Thread Harsh J
There's no filesystem (i.e. client) level APIs to do this, but the Balancer tool of HDFS does exactly this. Reading its sources should let you understand what kinda calls you need to make to reuse the balancer protocol and achieve what you need. In trunk, the balancer is at

Re: APIs to move data blocks within HDFS

2013-02-22 Thread Chris Nauroth
Regarding your question about a pluggable module to control placement of data, try taking a look at the abstract class BlockPlacementPolicy and BlockPlacementPolicyDefault, which is its default implementation. On branch-1, you can find these classes at

Re: APIs to move data blocks within HDFS

2013-02-22 Thread Karthiek C
Thank you Harsh and Chris. This really helps! -Karthiek On Fri, Feb 22, 2013 at 2:46 PM, Chris Nauroth cnaur...@hortonworks.comwrote: Regarding your question about a pluggable module to control placement of data, try taking a look at the abstract class BlockPlacementPolicy and