[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2017-04-28 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989545#comment-15989545
 ] 

Chris Trezzo commented on MAPREDUCE-5951:
-

[~xkrogen]

bq. Is this so that the uploading to SCM can be done by the NM, which is a 
privileged user, to have more secure control over it?

Yes exactly. We wanted to ensure that only trusted entities (i.e. the SCM and 
the node manager) were modifying the shared cached directories in HDFS. 
Additionally, we wanted to make sure that the checksum used when adding a 
resource to the cache was computed by a trusted entity as well.

> Add support for the YARN Shared Cache
> -
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2017-04-28 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989035#comment-15989035
 ] 

Erik Krogen commented on MAPREDUCE-5951:


Ah, excellent point, [~jlowe]... I actually would love to hear the reasoning 
behind the current strategy of  AM downloads 
resource -> AM uploads resource to SCM> rather than the seemingly more 
obvious/simpler . Is this so that the uploading 
to SCM can be done by the NM, which is a privileged user, to have more secure 
control over it?

[~ctrezzo], first off thanks for getting back so quickly! And for the pointer 
to YARN-5727; that's an interesting issue. The public visibility solution is 
certainly simpler from the YARN side and seems pretty reasonable from a point 
of expectation of burden on an application ("you want a publicly shared 
resource? put it somewhere public"). It  doesn't add _too_ much complexity on 
the MR side, though having a separate staging directory just for public 
resources is a bit cumbersome. It also means that other application developers 
will have to build the same type of logic - in general I would lean towards 
more logic pushed into the YARN level so that it is easy for application devs 
to support. I don't have good insight into how difficult your initially 
proposed solution in YARN-5727 would be to implement, though.

> Add support for the YARN Shared Cache
> -
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2017-04-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988744#comment-15988744
 ] 

Jason Lowe commented on MAPREDUCE-5951:
---

I don't think it really matters whether the jar resource uploaded by the client 
is public or private.  In both cases the HDFS path to which the client posts 
the resource will be removed when the job completes.  If any subsequent jobs 
come along and figure out via the SCM that they can avoid uploading their own, 
redundant copy of the same resource then they will receive a resource path 
within the SCM area which is a _different_ path than the one used by the first 
job.  That means the resource is going to get downloaded to the node again 
because it's in a different location than the first job's resource.

Even if the first job's client uploads the resource to a public directory, no 
other job is going to ask for that resource under the same path.  It will be 
uploaded to a public staging directory which is specific to that app and whose 
path exists only as long as the app.  The problem with having jobs try to share 
resources automatically just from the job client is knowing when the resource 
can be removed, otherwise we could yank it just as another app tries to 
localize it or never clean it up.  That's why the SCM does the necessary ref 
counting to know what's being used and when resources can be freed safely.  If 
we want to avoid the double-download of the resource then the job client will 
need to upload the resource to the SCM directly and then submit the job _after_ 
it has received the public resource path from the SCM.


> Add support for the YARN Shared Cache
> -
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org