[ 
https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe moved HADOOP-11905 to YARN-3636:
-----------------------------------------------------

          Component/s:     (was: fs)
        Fix Version/s:     (was: 2.7.1)
             Assignee:     (was: Kannan Rajah)
    Affects Version/s:     (was: 2.5.2)
                       2.5.2
           Issue Type: New Feature  (was: Bug)
                  Key: YARN-3636  (was: HADOOP-11905)
              Project: Hadoop YARN  (was: Hadoop Common)

> Abstraction for LocalDirAllocator
> ---------------------------------
>
>                 Key: YARN-3636
>                 URL: https://issues.apache.org/jira/browse/YARN-3636
>             Project: Hadoop YARN
>          Issue Type: New Feature
>    Affects Versions: 2.5.2
>            Reporter: Kannan Rajah
>              Labels: BB2015-05-TBR
>         Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not 
> HDFS. The proposal is to provide a new abstraction that encapsulates the 
> above 2 abstractions and hides who manages the local disks. This enables us 
> to provide an alternate implementation where a DFS can manage the local disks 
> and it can be accessed using HDFS APIs. This means the DFS maintains a 
> namespace for node local directories and can create paths that are guaranteed 
> to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate 
> data using this new implementation, it will continue write to local disk. 
> When a reducer needs to access data from a remote node, it can use HDFS APIs 
> with a path that points to that node’s local namespace instead of having to 
> use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. 
> So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific 
> node. It uses Configuration object to get user configured base directory and 
> appends the node hostname to it. Hence the returned paths are within the node 
> local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to