[jira] [Commented] (YARN-3636) Abstraction for LocalDirAllocator

2015-05-12 Thread Kannan Rajah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540977#comment-14540977
 ] 

Kannan Rajah commented on YARN-3636:


[~vinodkv] I agree that the use case is for shuffle. But since its a fairly 
simple change to LocalDirAllocator (implementing the interface), it would be 
beneficial to add it here. Maybe only Shuffle code path will use this interface 
and LocalDiskUtil for now. But the generalization is about whether DFS can 
manage node local disks. Please take a look at the design document attached to 
TEZ-2442.

> Abstraction for LocalDirAllocator
> -
>
> Key: YARN-3636
> URL: https://issues.apache.org/jira/browse/YARN-3636
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.5.2
>Reporter: Kannan Rajah
>  Labels: BB2015-05-TBR
> Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not 
> HDFS. The proposal is to provide a new abstraction that encapsulates the 
> above 2 abstractions and hides who manages the local disks. This enables us 
> to provide an alternate implementation where a DFS can manage the local disks 
> and it can be accessed using HDFS APIs. This means the DFS maintains a 
> namespace for node local directories and can create paths that are guaranteed 
> to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate 
> data using this new implementation, it will continue write to local disk. 
> When a reducer needs to access data from a remote node, it can use HDFS APIs 
> with a path that points to that node’s local namespace instead of having to 
> use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. 
> So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific 
> node. It uses Configuration object to get user configured base directory and 
> appends the node hostname to it. Hence the returned paths are within the node 
> local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3636) Abstraction for LocalDirAllocator

2015-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540962#comment-14540962
 ] 

Hadoop QA commented on YARN-3636:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 44s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 35s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 35s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  4s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 39s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | common tests |  23m 13s | Tests passed in 
hadoop-common. |
| | |  60m 21s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731224/0001-Abstraction-for-local-disk-path-allocation.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f24452d |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7904/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7904/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7904/console |


This message was automatically generated.

> Abstraction for LocalDirAllocator
> -
>
> Key: YARN-3636
> URL: https://issues.apache.org/jira/browse/YARN-3636
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.5.2
>Reporter: Kannan Rajah
>  Labels: BB2015-05-TBR
> Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not 
> HDFS. The proposal is to provide a new abstraction that encapsulates the 
> above 2 abstractions and hides who manages the local disks. This enables us 
> to provide an alternate implementation where a DFS can manage the local disks 
> and it can be accessed using HDFS APIs. This means the DFS maintains a 
> namespace for node local directories and can create paths that are guaranteed 
> to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate 
> data using this new implementation, it will continue write to local disk. 
> When a reducer needs to access data from a remote node, it can use HDFS APIs 
> with a path that points to that node’s local namespace instead of having to 
> use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. 
> So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific 
> node. It uses Configuration object to get user configured base directory and 
> appends the node hostname to it. Hence the returned paths are within the node 
> local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3636) Abstraction for LocalDirAllocator

2015-05-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540910#comment-14540910
 ] 

Vinod Kumar Vavilapalli commented on YARN-3636:
---

If this is only for shuffle, I agree, we need an abstraction in Shuffle instead 
of creating a generic interface for LocalDirAllocator.

> Abstraction for LocalDirAllocator
> -
>
> Key: YARN-3636
> URL: https://issues.apache.org/jira/browse/YARN-3636
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.5.2
>Reporter: Kannan Rajah
>  Labels: BB2015-05-TBR
> Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not 
> HDFS. The proposal is to provide a new abstraction that encapsulates the 
> above 2 abstractions and hides who manages the local disks. This enables us 
> to provide an alternate implementation where a DFS can manage the local disks 
> and it can be accessed using HDFS APIs. This means the DFS maintains a 
> namespace for node local directories and can create paths that are guaranteed 
> to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate 
> data using this new implementation, it will continue write to local disk. 
> When a reducer needs to access data from a remote node, it can use HDFS APIs 
> with a path that points to that node’s local namespace instead of having to 
> use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. 
> So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific 
> node. It uses Configuration object to get user configured base directory and 
> appends the node hostname to it. Hence the returned paths are within the node 
> local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)