Vinod Kumar Vavilapalli commented on YARN-3636:
If this is only for shuffle, I agree, we need an abstraction in Shuffle instead
of creating a generic interface for LocalDirAllocator.
> Abstraction for LocalDirAllocator
> Key: YARN-3636
> URL: https://issues.apache.org/jira/browse/YARN-3636
> Project: Hadoop YARN
> Issue Type: New Feature
> Affects Versions: 2.5.2
> Reporter: Kannan Rajah
> Labels: BB2015-05-TBR
> Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not
> HDFS. The proposal is to provide a new abstraction that encapsulates the
> above 2 abstractions and hides who manages the local disks. This enables us
> to provide an alternate implementation where a DFS can manage the local disks
> and it can be accessed using HDFS APIs. This means the DFS maintains a
> namespace for node local directories and can create paths that are guaranteed
> to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate
> data using this new implementation, it will continue write to local disk.
> When a reducer needs to access data from a remote node, it can use HDFS APIs
> with a path that points to that node’s local namespace instead of having to
> use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator.
> So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific
> node. It uses Configuration object to get user configured base directory and
> appends the node hostname to it. Hence the returned paths are within the node
> local namespace.
This message was sent by Atlassian JIRA