Re: Building custom block placement policy. What is srcPath?

Arjun Bakshi Thu, 24 Jul 2014 14:26:35 -0700

Hi,

Thanks for the reply. It cleared up a few things.

I hadn't thought of situations of under-replication, but I'll give itsome thought now. It should be easier since, as you've mentioned, bythat time the namenode knows all the blocks that came from the same fileas the under-replicated block.

For the most part, I was thinking of when a new file is being placed onthe cluster. I think this is what you called in-progress files. Say anew 1GB file needs to be placed on to the cluster. I want to make thesystem take information of the file being 1GB in size into account whileplacing all its blocks on to nodes in a cluster.

I'm not clear on where the file is broken down into blocks/chunks; interms of which class, which file system(local or hdfs), or where in theprocess flow. Knowing that will help me come up with a solution. Whereis the last place, in terms of a function or point in process that I canfind the name of the original file that is being placed on the system?

I'm reading the namenode and fsnamesystem code just to see if I can dowhat I want from there. Any suggestions will be appreciated.


Thank you,

AB



On 07/24/2014 02:12 PM, Harsh J wrote:

Hello,

(Inline)

On Thu, Jul 24, 2014 at 11:11 PM, Arjun Bakshi <[email protected]> wrote:

Hi,

I want to write a block placement policy that takes the size of the file
being placed into account. Something like what is done in CoHadoop or BEEMR
paper. I have the following questions:

1- What is srcPath in chooseTarget? Is it the path to the original
un-chunked file, or it is a path to a single block, or something else? I
added some code to blockplacementpolicydefault to print out the value of
srcPath but the results look odd.

The arguments are documented in the interface javadoc:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java#L61

The srcPath is the file path of the file on HDFS for which the block
placement targets are being requested.

2- Will a simple new File(srcPath) will do?

Please rephrase? The srcPath is not a local file if thats what you meant.

3- I've spent time looking at hadoop source code. I can't find a way to go
from srcPath in chooseTarget to a file size. Every function I think can do
it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be
called from inside the blockmanagement package or blockplacement class.

The block placement is something that, within a context of a new file
creation, is called when requesting a new block. At this point the
file is not complete, so there is no way to determine its actual
length, but only the requested block size. I'm not certain if
BlockPlacementPolicy is what will solve your goal.

How do I go from srcPath in blockplacement class to size of the file being
placed?

Are you targeting in-progress files or completed files? The latter
form of files would result in placement policy calls iff there's an
under-replication/losses/etc. to block replicas of the original set.
Only for such operations would you have a possibility to determine the
actual full length of file (as explained above).

Thank you,

AB

Re: Building custom block placement policy. What is srcPath?

Reply via email to