[ https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540910#comment-14540910 ]
Vinod Kumar Vavilapalli commented on YARN-3636: ----------------------------------------------- If this is only for shuffle, I agree, we need an abstraction in Shuffle instead of creating a generic interface for LocalDirAllocator. > Abstraction for LocalDirAllocator > --------------------------------- > > Key: YARN-3636 > URL: https://issues.apache.org/jira/browse/YARN-3636 > Project: Hadoop YARN > Issue Type: New Feature > Affects Versions: 2.5.2 > Reporter: Kannan Rajah > Labels: BB2015-05-TBR > Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch > > > There are 2 abstractions used to write data to local disk. > LocalDirAllocator: Allocate paths from a set of configured local directories. > LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.* > In the current implementation, local disk is managed by guest OS and not > HDFS. The proposal is to provide a new abstraction that encapsulates the > above 2 abstractions and hides who manages the local disks. This enables us > to provide an alternate implementation where a DFS can manage the local disks > and it can be accessed using HDFS APIs. This means the DFS maintains a > namespace for node local directories and can create paths that are guaranteed > to be present on a specific node. > Here is an example use case for Shuffle: When a mapper writes intermediate > data using this new implementation, it will continue write to local disk. > When a reducer needs to access data from a remote node, it can use HDFS APIs > with a path that points to that node’s local namespace instead of having to > use HTTP server to transfer the data across nodes. > New Abstractions > 1. LocalDiskPathAllocator > Interface to get file/directory paths from the local disk namespace. > This contains all the APIs that are currently supported by LocalDirAllocator. > So we just need to change LocalDirAllocator to implement this new interface. > 2. LocalDiskUtil > Helper class to get a handle to LocalDiskPathAllocator and the FileSystem > that is used to manage those paths. > By default, it will return LocalDirAllocator and LocalFileSystem. > A supporting DFS can return DFSLocalDirAllocator and an instance of DFS. > 3. DFSLocalDirAllocator > This is a generic implementation. An allocator is created for a specific > node. It uses Configuration object to get user configured base directory and > appends the node hostname to it. Hence the returned paths are within the node > local namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)