[jira] [Commented] (YARN-2327) YARN should warn about nodes with poor clock synchronization

2014-09-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120225#comment-14120225
 ] 

Alejandro Abdelnur commented on YARN-2327:
--

I'm not sure we should get into this. I would rely on the assumption that NTP 
is properly configured, including authentication to avoid attacks there.

 YARN should warn about nodes with poor clock synchronization
 

 Key: YARN-2327
 URL: https://issues.apache.org/jira/browse/YARN-2327
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen

 YARN should warn about nodes with poor clock synchronization.
 YARN relies on approximate clock synchronization to report certain elapsed 
 time statistics (see YARN-2251), but we currently don't warn if this 
 assumption is violated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2327) YARN should warn about nodes with poor clock synchronization

2014-09-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120423#comment-14120423
 ] 

Alejandro Abdelnur commented on YARN-2327:
--

fair enough, warning is good. BTW, why is this a YARN thing? shouldn't apply 
this to HDFS as well? (ie DTs)

 YARN should warn about nodes with poor clock synchronization
 

 Key: YARN-2327
 URL: https://issues.apache.org/jira/browse/YARN-2327
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen

 YARN should warn about nodes with poor clock synchronization.
 YARN relies on approximate clock synchronization to report certain elapsed 
 time statistics (see YARN-2251), but we currently don't warn if this 
 assumption is violated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104179#comment-14104179
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

I disagree on YARN-1253 being a breakage. 

Personally, I would never recommend using this in production. Given that, I'm 
OK with the patch if:

* the NM logs print a WARN at startup stating the setting.
* the container stdout/stderr also prints a WARN to alert the user of the 
setting.



 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104257#comment-14104257
 ] 

Alejandro Abdelnur edited comment on YARN-2424 at 8/20/14 6:10 PM:
---

I disagree on me being rude (or very rude) just for disagreeing with something. 
IMO security fixes trump backwards compatibility.

Anyway, I'm -0 with the patch if the WARNs are printed in in the RM at startup 
as Owen suggests. I insists that the WARN should be in the stderr/stdout of 
every container. Otherwise this will go completely unnoticed to users running 
apps. It should be obvious to them that they are exposed.



was (Author: tucu00):
I disagree in me being rude (or very rude) just for disagreeing with something. 
IMO security fixes trump backwards compatibility.

Anyway, I'm -0 with the patch if the WARNs are printed in in the RM at startup 
as Owen suggests. I insists that the WARN should be in the stderr/stdout of 
every container. Otherwise this will go completely unnoticed to users running 
apps. It should be obvious to them that they are exposed.


 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104257#comment-14104257
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

I disagree in me being rude (or very rude) just for disagreeing with something. 
IMO security fixes trump backwards compatibility.

Anyway, I'm -0 with the patch if the WARNs are printed in in the RM at startup 
as Owen suggests. I insists that the WARN should be in the stderr/stdout of 
every container. Otherwise this will go completely unnoticed to users running 
apps. It should be obvious to them that they are exposed.


 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104314#comment-14104314
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

if you don't have to kinit it is obvious security is OFF, no?

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104993#comment-14104993
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

sure, fine, enough cycles spent on this, thx.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: Y2424-1.patch, YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102357#comment-14102357
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

[~raviprak], allow to sudo to more than one user in unsecure mode, it doesn't 
give you any extra security. Actually, it may give you a sense of false 
security.

On using groups in the LCE blacklist/whitelist, i'll comment in YARN-2429.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2429) LCE should blacklist based upon group

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102367#comment-14102367
 ] 

Alejandro Abdelnur commented on YARN-2429:
--

Unless I'm mistaken, the blacklisting is done in the C code. Currently Hadoop 
uses the {{Groups}} class to fetch group info, there are multiple plugins for 
it (shell, ldap, jni, ...). This means that you'd have to either get all groups 
of the user before calling the LCE and passing them as params, or the LCE would 
have to connect to the same group source as the Java side of things. 

 LCE should blacklist based upon group
 -

 Key: YARN-2429
 URL: https://issues.apache.org/jira/browse/YARN-2429
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Allen Wittenauer

 It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102450#comment-14102450
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

If security is OFF, I can simply submit a job as ANY user by simply doing 
-Duser.name=ANY. User ANY will be the one used by YARN and HDFS (I'll leave it 
up to the reader to see how to do this).

I really don't like what this JIRA is proposing, and I've indicated what it 
would have to be done for me not to -1.



 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102538#comment-14102538
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

[~aw], I think you are missing the point.

I know that in an un-secure cluster you can fake the user name to interact with 
HDFS or YARN from anywhere at anytime. 

YARN-1253 is not about protecting HDFS or YARN, it is about protecting the node 
at OS level by enforcing the use of a least privileged user.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102629#comment-14102629
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

Ravi, all the config in the container-executor.cfg is EXCLUSIVELY for enforcing 
constraints on the process to be launched, it does not restrict a launched JVM 
process from doing a {{System.setProperty(user.name, ANY)}} to gain access 
to HDFS as user ANY (if Kerberos is ON, setting 'user.name' property has no 
effect).

BTW, I'm not OK with making this a valid configuration, it is not.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102794#comment-14102794
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

Repeating myself from a previous comment: YARN-1253 is not about protecting 
HDFS or YARN, it is about protecting the node at OS level by enforcing the use 
of a least privileged user.


 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102941#comment-14102941
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

Having more than one 'least privileged' user does not bring you any benefit as 
they can always step on each other by faking the username at job submission.


 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103063#comment-14103063
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

You are saying this is proactive troubleshooting then, not meant for 
production? If so, then, as I said before:

* the property has 'use-only-for-troubleshooting' in its name.
* the NM logs print a WARN at startup and on every started container stating 
the flag and its un-secure nature
* the container stdout/stderr also print a WARN to alert the user of the 
cluster setup.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-18 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101533#comment-14101533
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

I really don't like it, it is not my business how you run your clusters, but 
this is dangerous, specially in a multi-tenancy scenario. From Allen's comment 
(the one I highlighted) it is not clear to me this is meant only for 
setup/troubleshooting usage.

I would not -1 this JIRA if...

* the property has 'use-only-for-troubleshooting' in its name.
* the NM logs print a WARN at startup and on every started container stating 
the flag and its un-secure nature
* the container stdout/stderr also print a WARN to alert the user of the 
cluster setup.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
  Labels: regression
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-17 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100134#comment-14100134
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

please refer to yarn-1253 comments, it was stated there that the old behavior 
had security issues.

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
  Labels: regression
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-17 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100158#comment-14100158
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

please go over todd's comment over the security issues on sudoing as user 
without secure auth. definitely you don't want to do that in a multi-tenant 
cluster. 

btw, fixing a security bug is not a regression.  

 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
  Labels: regression
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2014-08-17 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100302#comment-14100302
 ] 

Alejandro Abdelnur commented on YARN-2424:
--

I think I did, if I'm reading correctly you are stating that is better for 
troubleshooting, specially in multi-tenant scenarios: 

bq. It's also worth pointing out that one of the key benefits of running tasks 
as the user who submitted them is that it makes troubleshooting much easier. 
When one hops on a node, it is evident as to which user's tasks one is looking 
at it, even if those tasks aren't validated as that user. This is especially 
important in heavy multi-tenant scenarios.


 LCE should support non-cgroups, non-secure mode
 ---

 Key: YARN-2424
 URL: https://issues.apache.org/jira/browse/YARN-2424
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
Reporter: Allen Wittenauer
Priority: Blocker
  Labels: regression
 Attachments: YARN-2424.patch


 After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
 This is a fairly serious regression, as turning on LCE prior to turning on 
 full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-24 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073328#comment-14073328
 ] 

Alejandro Abdelnur commented on YARN-2348:
--

Allen suggestion of making selectable from the browser makes sense. 

In Oozie, we are doing this. Because JavaScript does not have built in 
libraries for TZ handling, what we did is:

* have request parameter that specifies the desired TZ for datetime values, 
default value is UTC.
* TZ conversion happens on the server side when producing the JSON output using 
the TZ request parameter.
* have a REST call that returns the list of available TZ.
* have a dropdown in the UI that shows the available TZs (uses the rest call 
from previous bullet)
* use a cookie to remember the user selected TZ
* if the cookie is present, set the TZ request parameter with it.



 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068704#comment-14068704
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Wandga, previously I've missed the new doc explaining label predicates. Thanks 
for pointing it out.

How about first shooting for the following?

* RM has list of valid labels. (hot reloadable)
* NMs have list of labels. (hot reloadable)
* NMs report labels at registration time and on heartbeats when they change
* label-expressions support  (AND) only
* app able to specify a label-expression when making a resource request
* queues to AND augment the label expression with the queue label-expression

And later we can add (in a backwards compatible way)

* add support for OR and NOT to label-expressions
* add label ACLs
* centralized per NM configuration, REST API for it, etc, etc

Thoughts?


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068048#comment-14068048
 ] 

Alejandro Abdelnur commented on YARN-796:
-

i agree with sandy and allen. 

said that, we currently dont do any thing centralized on per nodemanager basis, 
if we want to so that we should think solving it in a more general way than 
just labels. and i would suggest doing that (if we decide to) in a diff jira. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068120#comment-14068120
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Wangda, your usecase is throwing overboard the work pf the scheduler regarding 
matching nodes with data locality. you can solve it in a much better way using 
scheduler queues configuration, which can be dynamically adjusted. 

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068144#comment-14068144
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Wangda, i'm afraid i'm lost with your last comment. i thought labels were to 
express desired node affinity base on a label, not to fence off nodes. i don't 
understand how you will achieve fencing off a node with a label unless you have 
a more complex annotation mechanism than just a label (ie book this node only 
if label X is present) also you would have to add ACLs to labels to avoid 
anybody simply asking for a label. 

am i missing something?

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045097#comment-14045097
 ] 

Alejandro Abdelnur commented on YARN-2194:
--

do we need to have a special container executor? Or a specialized 
{{LCEResourcesHandler}} would do the trick?

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2194) Add Cgroup support for RedHat 7

2014-06-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045105#comment-14045105
 ] 

Alejandro Abdelnur commented on YARN-2194:
--

i was not meaning autodetection, i was meaning that a new resource handler may 
be enough to deal with cgroups in RH7, without having to write a new LCE.

 Add Cgroup support for RedHat 7
 ---

 Key: YARN-2194
 URL: https://issues.apache.org/jira/browse/YARN-2194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan

 In previous versions of RedHat, we can build custom cgroup hierarchies with 
 use of the cgconfig command from the libcgroup package. From RedHat 7, 
 package libcgroup is deprecated and it is not recommended to use it since it 
 can easily create conflicts with the default cgroup hierarchy. The systemd is 
 provided and recommended for cgroup management. We need to add support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2139) Add support for disk IO isolation/scheduling for containers

2014-06-10 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027379#comment-14027379
 ] 

Alejandro Abdelnur commented on YARN-2139:
--

As Sandy says, for local reads short circuit kicks in for local HDFS blocks. It 
kicks in for any other local FS access. So block io controller would kick in. 
For remote HDFS reads, network io controller would kick in. So effectively we 
can control the resources the container uses.

 Add support for disk IO isolation/scheduling for containers
 ---

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:08 AM:
---

[~jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.



was (Author: tucu00):
[~ jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur edited comment on YARN-1368 at 5/15/14 5:09 AM:
---

[~jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing [to the assignee/author of the original patch] the 
changes and offering to contribute/breakdown tasks. Please do so next time.



was (Author: tucu00):
[~jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur commented on YARN-1368:
--

[~ jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-373) Allow an AM to reuse the resources allocated to container for a new container

2014-05-15 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-373.
-

Resolution: Won't Fix

[doing self-clean up of JIRAs]

 Allow an AM to reuse the resources allocated to container for a new container
 -

 Key: YARN-373
 URL: https://issues.apache.org/jira/browse/YARN-373
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 When a container completes, instead the corresponding resources being freed 
 up, it should be possible for the AM to reuse the assigned resources for a 
 new container.
 As part of the reallocation, the AM would notify the RM about partial 
 resources being freed up and the RM would make the necessary corrections in 
 the corresponding node.
 With this functionality, an AM can ensure it gets a container in the same 
 node where previous containers run.
 This will allow getting rid of the ShuffleHandler as a service in the NMs and 
 run it as regular container task of the corresponding AM. In this case, the 
 reallocation would reduce the CPU/MEM obtained for the original container to 
 the what is needed for serving the shuffle. Note that in this example the MR 
 AM would only do this reallocation for one of the many tasks that may have 
 run in a particular node (as a single shuffle task could serve all the map 
 outputs from all map tasks run in that node). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997024#comment-13997024
 ] 

Alejandro Abdelnur commented on YARN-1368:
--

[~jianhe], [~vinodkv], 

Unless I'm missing something, Anubhav was working on this JIRA. It is great 
that Jian did the refactoring to have common code for the schedulers and some 
testcases for it, but most of the work has been done by Anubhav and he was 
working actively on it. We should reassign the JIRA back to Anubhav and let him 
drive it to completion, agree?

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.

2014-04-15 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur moved HADOOP-10505 to YARN-1943:
---

  Component/s: (was: security)
   nodemanager
Fix Version/s: (was: 2.3.0)
   2.3.0
Affects Version/s: (was: 2.3.0)
   2.3.0
  Key: YARN-1943  (was: HADOOP-10505)
  Project: Hadoop YARN  (was: Hadoop Common)

 Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
 -

 Key: YARN-1943
 URL: https://issues.apache.org/jira/browse/YARN-1943
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: jay vyas
Priority: Critical
  Labels: linux
 Fix For: 2.3.0


 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser 
 replaces the user who submits a job if security is disabled: 
 {noformat}
  return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser;
 {noformat}
 However, the only way to enable security, is to NOT use SIMPLE authentication 
 mode:
 {noformat}
   public static boolean isSecurityEnabled() {
 return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE);
   }
 {noformat}
  
 Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser 
 for submission of LinuxExecutorContainer.
 This results in a confusing issue, wherein we submit a job as sally and 
 then get an exception that user nobody is not whitelisted and has UID  
 MAX_ID.
 My proposed solution is that we should be able to leverage 
 LinuxContainerExector regardless of hadoop's view of the security settings on 
 the cluster, i.e. decouple LinuxContainerExecutor logic from the 
 isSecurityEnabled return value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.

2014-04-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969746#comment-13969746
 ] 

Alejandro Abdelnur commented on YARN-1943:
--

[~jayunit100], please refer to YARN-1253 for details on why is this way. IMO 
this is not a bug.

 Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
 -

 Key: YARN-1943
 URL: https://issues.apache.org/jira/browse/YARN-1943
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: jay vyas
Priority: Critical
  Labels: linux
 Fix For: 2.3.0


 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser 
 replaces the user who submits a job if security is disabled: 
 {noformat}
  return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser;
 {noformat}
 However, the only way to enable security, is to NOT use SIMPLE authentication 
 mode:
 {noformat}
   public static boolean isSecurityEnabled() {
 return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE);
   }
 {noformat}
  
 Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser 
 for submission of LinuxExecutorContainer.
 This results in a confusing issue, wherein we submit a job as sally and 
 then get an exception that user nobody is not whitelisted and has UID  
 MAX_ID.
 My proposed solution is that we should be able to leverage 
 LinuxContainerExector regardless of hadoop's view of the security settings on 
 the cluster, i.e. decouple LinuxContainerExecutor logic from the 
 isSecurityEnabled return value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.

2014-04-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969821#comment-13969821
 ] 

Alejandro Abdelnur commented on YARN-1943:
--

O the yarn-site.xml of the NMs:

{code}
   property
descriptionThe UNIX user that containers will run as when 
Linux-container-executor
is used in nonsecure mode (a use case for this is using 
cgroups)./description

nameyarn.nodemanager.linux-container-executor.nonsecure-mode.local-user/name
valuenobody/value
  /property
{code}


 Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
 -

 Key: YARN-1943
 URL: https://issues.apache.org/jira/browse/YARN-1943
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: jay vyas
Priority: Critical
  Labels: linux
 Fix For: 2.3.0


 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser 
 replaces the user who submits a job if security is disabled: 
 {noformat}
  return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser;
 {noformat}
 However, the only way to enable security, is to NOT use SIMPLE authentication 
 mode:
 {noformat}
   public static boolean isSecurityEnabled() {
 return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE);
   }
 {noformat}
  
 Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser 
 for submission of LinuxExecutorContainer.
 This results in a confusing issue, wherein we submit a job as sally and 
 then get an exception that user nobody is not whitelisted and has UID  
 MAX_ID.
 My proposed solution is that we should be able to leverage 
 LinuxContainerExector regardless of hadoop's view of the security settings on 
 the cluster, i.e. decouple LinuxContainerExecutor logic from the 
 isSecurityEnabled return value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.

2014-04-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969973#comment-13969973
 ] 

Alejandro Abdelnur commented on YARN-1943:
--

not really, refer to the {{container-executor.c}} file, you'll see how things 
work.

 Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
 -

 Key: YARN-1943
 URL: https://issues.apache.org/jira/browse/YARN-1943
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: jay vyas
Priority: Critical
  Labels: linux
 Fix For: 2.3.0


 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser 
 replaces the user who submits a job if security is disabled: 
 {noformat}
  return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser;
 {noformat}
 However, the only way to enable security, is to NOT use SIMPLE authentication 
 mode:
 {noformat}
   public static boolean isSecurityEnabled() {
 return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE);
   }
 {noformat}
  
 Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser 
 for submission of LinuxExecutorContainer.
 This results in a confusing issue, wherein we submit a job as sally and 
 then get an exception that user nobody is not whitelisted and has UID  
 MAX_ID.
 My proposed solution is that we should be able to leverage 
 LinuxContainerExector regardless of hadoop's view of the security settings on 
 the cluster, i.e. decouple LinuxContainerExecutor logic from the 
 isSecurityEnabled return value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1849) NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters

2014-03-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941007#comment-13941007
 ] 

Alejandro Abdelnur commented on YARN-1849:
--

+1 pending jenkins.

 NPE in ResourceTrackerService#registerNodeManager for UAM on secure clusters
 

 Key: YARN-1849
 URL: https://issues.apache.org/jira/browse/YARN-1849
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1849-1.patch, yarn-1849-2.patch, yarn-1849-2.patch, 
 yarn-1849-3.patch


 While running an UnmanagedAM on secure cluster, ran into an NPE on 
 failover/restart. This is similar to YARN-1821. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-14 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935594#comment-13935594
 ] 

Alejandro Abdelnur commented on YARN-796:
-

scheduler configurations are refreshed dynamically, if the list of valid labels 
is there, it could be refreshed as well. i would prefer to detect  reject 
typos from a user experience and troubleshooting point of view.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934293#comment-13934293
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Arun, doing a recap on the config, is this what you mean?

ResourceManager {{yarn-site.xml}} would specify the valid labels systemwide 
(you didn't suggest this, but it prevent label typos going unnoticed):

{code}
property
  nameyarn.resourcemanager.valid-labels/name
  valuelabelA, labelB, labelX/value
/properties
{code}

NodeManagers yarn-site.xml would specify the labels of the node:

{code}
property
  nameyarn.nodemanager.labels/name
  valuelabelA, labelX/value
/properties
{code}

Scheduler configuration, in its queue configuration would specify what labels 
can be used when requesting allocations in that queue:

{code}
property
  nameyarn.scheduler.capacity.root.A.allowed-labels/name
  valuelabelA/value
/properties
{code}


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1808) add ability for AM to attach simple state information to containers

2014-03-10 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925866#comment-13925866
 ] 

Alejandro Abdelnur commented on YARN-1808:
--

where this container be stored? RM or NMs? you would need a DT for the 
container to be able to push its state to the 'state store'. also, what is 
simple for you, a string, a long? 

 add ability for AM to attach simple state information to containers
 ---

 Key: YARN-1808
 URL: https://issues.apache.org/jira/browse/YARN-1808
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Steve Loughran
Priority: Minor

 AM restart could be aided if we could add some state information to the 
 containers themselves. This allows the AM to rebuild its state by querying 
 all of its containers for their status. The AM will of course also need to be 
 able to write this.
 This isn't critical: code running in the containers can do the same thing. It 
 just appears to be a common use case



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-1808) add ability for AM to attach simple state information to containers

2014-03-10 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925866#comment-13925866
 ] 

Alejandro Abdelnur edited comment on YARN-1808 at 3/10/14 4:41 PM:
---

where this container state be stored? RM or NMs? you would need a DT for the 
container to be able to push its state to the 'state store'. also, what is 
simple for you, a string, a long? 


was (Author: tucu00):
where this container be stored? RM or NMs? you would need a DT for the 
container to be able to push its state to the 'state store'. also, what is 
simple for you, a string, a long? 

 add ability for AM to attach simple state information to containers
 ---

 Key: YARN-1808
 URL: https://issues.apache.org/jira/browse/YARN-1808
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Steve Loughran
Priority: Minor

 AM restart could be aided if we could add some state information to the 
 containers themselves. This allows the AM to rebuild its state by querying 
 all of its containers for their status. The AM will of course also need to be 
 able to write this.
 This isn't critical: code running in the containers can do the same thing. It 
 just appears to be a common use case



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-02-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908479#comment-13908479
 ] 

Alejandro Abdelnur commented on YARN-1702:
--

It seems my comment on the umbrella JIRA has gone unnoticed, posting here as 
well.


Before we start coding work on this, it would be great to see how security will 
handled (authentication, ACLs, tokens, etc).
I'm a bit a concern about introducing a second protocol everywhere. From a 
maintenance and security risk point of view is doubling our development/support 
efforts. Granted, HDFS offers data over RPC and HTTP. But, HTTP, when using 
HttpFS (how I recommend using it HTTP access) is a gateway that ends up doing 
RPC to HDFS. Thus the only protocol accessing HDFS is RPC.

Have we considered having a C implementation of Hadoop RPC, with the 
multiplatform support of protobuffers that may give us the multi-platform 
support we are aiming to with a single protocol interface.

Thoughts?


 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.2.patch, apache-yarn-1702.3.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1695) Implement the rest (writable APIs) of RM web-services

2014-02-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898052#comment-13898052
 ] 

Alejandro Abdelnur commented on YARN-1695:
--

err, [~vinodkv], isn't this a duplicate of the JIRA you opened back in DEC, 
YARN-1538?

 Implement the rest (writable APIs) of RM web-services
 -

 Key: YARN-1695
 URL: https://issues.apache.org/jira/browse/YARN-1695
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs 
 added there were only focused on obtaining information from the cluster. We 
 need to have the following REST APIs to finish the feature
  - Application submission/termination (Priority): This unblocks easy client 
 interaction with a YARN cluster
  - Application Client protocol: For resource scheduling by apps written in an 
 arbitrary language. Will have to think about throughput concerns
  - ContainerManagement Protocol: Again for arbitrary language apps.
 One important thing to note here is that we already have client libraries on 
 all the three protocols that do some some heavy-lifting. One part of the 
 effort is to figure out if they can be made any thinner and/or how 
 web-services will implement the same functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1695) Implement the rest (writable APIs) of RM web-services

2014-02-11 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898112#comment-13898112
 ] 

Alejandro Abdelnur commented on YARN-1695:
--

Before we start coding work on this, it would be great to see how security will 
handled (authentication, ACLs, tokens, etc).

I'm a bit a concern about introducing a second protocol everywhere. From a 
maintenance and security risk point of view is doubling our development/support 
efforts. Granted, HDFS offers data over RPC and HTTP. But, HTTP, when using 
HttpFS (how I recommend using it HTTP access) is a gateway that ends up doing 
RPC to HDFS. Thus the only protocol accessing HDFS is RPC.

Have we considered having a C implementation of Hadoop RPC, with the 
multiplatform support of protobuffers that may give us the multi-platform 
support we are aiming to with a single protocol interface.

Thoughts?


 Implement the rest (writable APIs) of RM web-services
 -

 Key: YARN-1695
 URL: https://issues.apache.org/jira/browse/YARN-1695
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs 
 added there were only focused on obtaining information from the cluster. We 
 need to have the following REST APIs to finish the feature
  - Application submission/termination (Priority): This unblocks easy client 
 interaction with a YARN cluster
  - Application Client protocol: For resource scheduling by apps written in an 
 arbitrary language. Will have to think about throughput concerns
  - ContainerManagement Protocol: Again for arbitrary language apps.
 One important thing to note here is that we already have client libraries on 
 all the three protocols that do some some heavy-lifting. One part of the 
 effort is to figure out if they can be made any thinner and/or how 
 web-services will implement the same functionality.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-02-07 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1577:
-

Target Version/s: 2.4.0  (was: 2.3.0)

moving it out of 2.3 as [~vinodkv] reverted from 2.3 the JIRAs introducing the 
problem.

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-02-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891853#comment-13891853
 ] 

Alejandro Abdelnur commented on YARN-1577:
--

the problem I'm seeing with YARN-1493 is when trying to register the UAM with 
the scheduler, the exception I'm getting follows.

Reverting  YARN-1493  friends makes this problem go away. 

The patches to revert, in order are:

YARN-1566
YARN-1041 
YARN-1166
YARN-1490
YARN-1493

{code}
2014-02-03 10:58:40,403 ERROR UserGroupInformation - PriviledgedActionException 
as:llama (auth:PROXY) via tucu (auth:SIMPLE) 
cause:org.apache.hadoop.security.AccessControlException: SIMPLE authentication 
is not enabled.  Available:[TOKEN]
2014-02-03 10:58:40,407 WARN  LlamaAMServiceImpl - Reserve() error: 
com.cloudera.llama.util.LlamaException: AM_CANNOT_REGISTER - cannot register AM 
'application_1391453743418_0001' for queue 'root.queue1' : 
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN]
com.cloudera.llama.util.LlamaException: AM_CANNOT_REGISTER - cannot register AM 
'application_1391453743418_0001' for queue 'root.queue1' : 
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not 
enabled.  Available:[TOKEN]
…
Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE 
authentication is not enabled.  Available:[TOKEN]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at $Proxy12.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
at 
com.cloudera.llama.am.yarn.YarnRMConnector._initYarnApp(YarnRMConnector.java:270)
at 
com.cloudera.llama.am.yarn.YarnRMConnector.access$200(YarnRMConnector.java:80)
at 
com.cloudera.llama.am.yarn.YarnRMConnector$3.run(YarnRMConnector.java:212)
at 
com.cloudera.llama.am.yarn.YarnRMConnector$3.run(YarnRMConnector.java:209)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
com.cloudera.llama.am.yarn.YarnRMConnector.register(YarnRMConnector.java:209)
... 20 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 SIMPLE authentication is not enabled.  Available:[TOKEN]
at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy11.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
... 37 more
{code}



 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that 

[jira] [Commented] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-23 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880466#comment-13880466
 ] 

Alejandro Abdelnur commented on YARN-1629:
--

+1

 IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
 --

 Key: YARN-1629
 URL: https://issues.apache.org/jira/browse/YARN-1629
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch


 This can occur when the second-to-last app in a queue's pending app list is 
 made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1623) Include queue name in RegisterApplicationMasterResponse

2014-01-22 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879160#comment-13879160
 ] 

Alejandro Abdelnur commented on YARN-1623:
--

+1 pending jenkins

 Include queue name in RegisterApplicationMasterResponse
 ---

 Key: YARN-1623
 URL: https://issues.apache.org/jira/browse/YARN-1623
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1623-1.patch, YARN-1623.patch


 This provides the YARN change necessary to support MAPREDUCE-5732.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1601) 3rd party JARs are missing from hadoop-dist output

2014-01-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872270#comment-13872270
 ] 

Alejandro Abdelnur commented on YARN-1601:
--

Thanks Steve. Thanks Sean, regarding the stax-api JAR, I've opened another JIRA 
to fix that, HADOOP-10235.

 3rd party JARs are missing from hadoop-dist output
 --

 Key: YARN-1601
 URL: https://issues.apache.org/jira/browse/YARN-1601
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-1601.patch


 With the build changes of YARN-888 we are leaving out all 3rd party JArs used 
 directly by YARN under /share/hadoop/yarn/lib/.
 We did not notice this when running minicluster because they all happen to be 
 in the classpath from hadoop-common and hadoop-yarn.
 As 3d party JARs are not 'public' interfaces we cannot rely on them being 
 provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd 
 party dependency that yarn uses this would break yarn if yarn does not pull 
 that dependency explicitly).
 Also, this will break bigtop hadoop build when they move to use branch-2 as 
 they expect to find jars in /share/hadoop/yarn/lib/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1608) LinuxContainerExecutor has a few DEBUG messages at INFO level

2014-01-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872937#comment-13872937
 ] 

Alejandro Abdelnur commented on YARN-1608:
--

+1

 LinuxContainerExecutor has a few DEBUG messages at INFO level
 -

 Key: YARN-1608
 URL: https://issues.apache.org/jira/browse/YARN-1608
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: log
 Attachments: yarn-1608-1.patch


 LCE has a few INFO level log messages meant to be at debug level. In fact, 
 they are logged both at INFO and DEBUG. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1608) LinuxContainerExecutor has a few DEBUG messages at INFO level

2014-01-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872938#comment-13872938
 ] 

Alejandro Abdelnur commented on YARN-1608:
--

+1

 LinuxContainerExecutor has a few DEBUG messages at INFO level
 -

 Key: YARN-1608
 URL: https://issues.apache.org/jira/browse/YARN-1608
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
  Labels: log
 Attachments: yarn-1608-1.patch


 LCE has a few INFO level log messages meant to be at debug level. In fact, 
 they are logged both at INFO and DEBUG. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1601) 3rd party JARs are missing from hadoop-dist output

2014-01-14 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created YARN-1601:


 Summary: 3rd party JARs are missing from hadoop-dist output
 Key: YARN-1601
 URL: https://issues.apache.org/jira/browse/YARN-1601
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


With the build changes of YARN-888 we are leaving out all 3rd party JArs used 
directly by YARN under /share/hadoop/yarn/lib/.

We did not notice this when running minicluster because they all happen to be 
in the classpath from hadoop-common and hadoop-yarn.

As 3d party JARs are not 'public' interfaces we cannot rely on them being 
provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd 
party dependency that yarn uses this would break yarn if yarn does not pull 
that dependency explicitly).

Also, this will break bigtop hadoop build when they move to use branch-2 as 
they expect to find jars in /share/hadoop/yarn/lib/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1601) 3rd party JARs are missing from hadoop-dist output

2014-01-14 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1601:
-

Attachment: YARN-1601.patch

Patch that adds to the hadoop-yarn POM the submodules as dependencies, this is 
required to be able to populate the /share/hadoop/lib/ dir with 3rd party JARs 
used by Yarn. The hadoop-yarn POM isn't a parent of any other POM so this does 
not affect existing dependencies. It just makes the assembly for yarn to pick 
up Yarn 3rd party JARs when creating the tarball


 3rd party JARs are missing from hadoop-dist output
 --

 Key: YARN-1601
 URL: https://issues.apache.org/jira/browse/YARN-1601
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-1601.patch


 With the build changes of YARN-888 we are leaving out all 3rd party JArs used 
 directly by YARN under /share/hadoop/yarn/lib/.
 We did not notice this when running minicluster because they all happen to be 
 in the classpath from hadoop-common and hadoop-yarn.
 As 3d party JARs are not 'public' interfaces we cannot rely on them being 
 provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd 
 party dependency that yarn uses this would break yarn if yarn does not pull 
 that dependency explicitly).
 Also, this will break bigtop hadoop build when they move to use branch-2 as 
 they expect to find jars in /share/hadoop/yarn/lib/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869658#comment-13869658
 ] 

Alejandro Abdelnur commented on YARN-888:
-

test failure (timeout) seems unrelated.

[~vinodkv], [~ste...@apache.org], [~kkambatl], [~rvs], over the weekend I've 
updated the patch with comments in the YARN non-leaf POM stating no 
dependencies should be added there. No other changes. Unless I hear further 
comments I'm planning to commit this later today to trunk and branch-2. Thanks 
for your reviews.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (YARN-888) clean up POM dependencies

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869658#comment-13869658
 ] 

Alejandro Abdelnur edited comment on YARN-888 at 1/13/14 5:22 PM:
--

test failure (timeout) seems unrelated.

[~vinodkv], [~ste...@apache.org], [~kkambatl], [~rvs], over the weekend I've 
updated the patch with comments in the YARN non-leaf POMs stating no 
dependencies should be added there. No other changes. Unless I hear further 
comments I'm planning to commit this later today to trunk and branch-2. Thanks 
for your reviews.


was (Author: tucu00):
test failure (timeout) seems unrelated.

[~vinodkv], [~ste...@apache.org], [~kkambatl], [~rvs], over the weekend I've 
updated the patch with comments in the YARN non-leaf POM stating no 
dependencies should be added there. No other changes. Unless I hear further 
comments I'm planning to commit this later today to trunk and branch-2. Thanks 
for your reviews.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869825#comment-13869825
 ] 

Alejandro Abdelnur commented on YARN-888:
-

Thanks [~vinodkv].

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.4.0

 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869843#comment-13869843
 ] 

Alejandro Abdelnur commented on YARN-888:
-

OK, it seems we have our first victim, MAPREDUCE-5722. I don't know why this 
didn't come up before. Taking care of it right now.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.4.0

 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869856#comment-13869856
 ] 

Alejandro Abdelnur commented on YARN-888:
-

False alarm, it seems i had stale POMs in my local Maven cache, still we need 
to take care of MAPREDUCE-5362 (the equiv of this JIRA for MR).

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.4.0

 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869940#comment-13869940
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

IMO, we should stick to the current API as Sandy suggest, it is a bit 
unfortunate the default being ALL instead of OWN but, well, what to do? 

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869959#comment-13869959
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

In MR-land when submitting a job we can specify VIEW/MODIFY ACLs. It seems that 
in Yarn-land this is not possible for AMs. If I'm right with this, it seems 
like a missing functionality would naturally scope down what is returned by 
getApplications. And we could do that with in a backwards compatible way.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1598) HA-related rmadmin commands don't work on a secure cluster

2014-01-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13870215#comment-13870215
 ] 

Alejandro Abdelnur commented on YARN-1598:
--

LTGM, +1 after jenkins.

 HA-related rmadmin commands don't work on a secure cluster
 --

 Key: YARN-1598
 URL: https://issues.apache.org/jira/browse/YARN-1598
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1598-1.patch


 The HA-related commands like -getServiceState -checkHealth etc. don't work in 
 a secure cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-888) clean up POM dependencies

2014-01-12 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-888:


Attachment: YARN-888.patch

new patch adding following comment in YARN non-leaf POMs:

  !-- Do not add dependencies here, add them to the POM of the leaf module --



 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-10 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868316#comment-13868316
 ] 

Alejandro Abdelnur commented on YARN-888:
-

[~vinodkv], any further concerns? 

Also, latest patch cleanly applies to branch-2 at the moment.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-09 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845
 ] 

Alejandro Abdelnur commented on YARN-888:
-

[~vinodvk], 

While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it 
drags non-required dependencies (unless you only put in non-leaf POMs 
dependencies that are common to all the leaf modules).

Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is 
one of the motivations (agree it is an IntelliJ issue, on the other hand the 
change does not affect the project built at all and allows IntelliJ users to 
build/debug from the IDE out of the box without doing funny voodoo).

The other motivation, and IMO is more important, is to clean up the 
dependencies modules like yarn-api and yarn-client have. Restricting them to 
what is used on the client side. Using the dependency:tree and 
dependency:analyze plugins I’ve reduced the 3rd party JARs required by the 
clients significantly. As [~ste...@apache.org] pointed out there is much more 
work we should do in this direction, this is a first non-intrusive baby step in 
that direction.

To give you and idea, before the this patch *hadoop-yarn-api* reports as 
required dependencies by itself:

{code}
 +- org.slf4j:slf4j-api:jar:1.7.5:compile
 +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
 |  \- log4j:log4j:jar:1.2.17:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 |  +- tomcat:jasper-compiler:jar:5.5.23:test
 +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
 +- io.netty:netty:jar:3.6.2.Final:compile
 +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
 +- commons-io:commons-io:jar:2.4:compile
 +- com.google.inject:guice:jar:3.0:compile
 |  +- javax.inject:javax.inject:jar:1:compile
 |  \- aopalliance:aopalliance:jar:1.0:compile
 +- com.sun.jersey:jersey-server:jar:1.9:compile
 |  +- asm:asm:jar:3.2:compile
 |  \- com.sun.jersey:jersey-core:jar:1.9:compile
 +- com.sun.jersey:jersey-json:jar:1.9:compile
 |  +- org.codehaus.jettison:jettison:jar:1.1:compile
 |  |  \- stax:stax-api:jar:1.0.1:compile
 |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
 |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
 |  | \- javax.activation:activation:jar:1.1:compile
 |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed 
from 1.8.3)
 |  \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 
1.8.3)
 \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
{code}

With the patch, the required dependencies by itself are down to:

{code}
 +- commons-lang:commons-lang:jar:2.6:compile
 +- com.google.guava:guava:jar:11.0.2:compile
 |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
 +- commons-logging:commons-logging:jar:1.1.3:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 \- com.google.protobuf:protobuf-java:jar:2.5.0:compile
{code}

Does this address your concerns?

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (YARN-888) clean up POM dependencies

2014-01-09 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845
 ] 

Alejandro Abdelnur edited comment on YARN-888 at 1/9/14 5:50 PM:
-

[~vinodkv], 

While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it 
drags non-required dependencies (unless you only put in non-leaf POMs 
dependencies that are common to all the leaf modules).

Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is 
one of the motivations (agree it is an IntelliJ issue, on the other hand the 
change does not affect the project built at all and allows IntelliJ users to 
build/debug from the IDE out of the box without doing funny voodoo).

The other motivation, and IMO is more important, is to clean up the 
dependencies modules like yarn-api and yarn-client have. Restricting them to 
what is used on the client side. Using the dependency:tree and 
dependency:analyze plugins I’ve reduced the 3rd party JARs required by the 
clients significantly. As [~ste...@apache.org] pointed out there is much more 
work we should do in this direction, this is a first non-intrusive baby step in 
that direction.

To give you and idea, before the this patch *hadoop-yarn-api* reports as 
required dependencies by itself:

{code}
 +- org.slf4j:slf4j-api:jar:1.7.5:compile
 +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
 |  \- log4j:log4j:jar:1.2.17:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 |  +- tomcat:jasper-compiler:jar:5.5.23:test
 +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
 +- io.netty:netty:jar:3.6.2.Final:compile
 +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
 +- commons-io:commons-io:jar:2.4:compile
 +- com.google.inject:guice:jar:3.0:compile
 |  +- javax.inject:javax.inject:jar:1:compile
 |  \- aopalliance:aopalliance:jar:1.0:compile
 +- com.sun.jersey:jersey-server:jar:1.9:compile
 |  +- asm:asm:jar:3.2:compile
 |  \- com.sun.jersey:jersey-core:jar:1.9:compile
 +- com.sun.jersey:jersey-json:jar:1.9:compile
 |  +- org.codehaus.jettison:jettison:jar:1.1:compile
 |  |  \- stax:stax-api:jar:1.0.1:compile
 |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
 |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
 |  | \- javax.activation:activation:jar:1.1:compile
 |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed 
from 1.8.3)
 |  \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 
1.8.3)
 \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
{code}

With the patch, the required dependencies by itself are down to:

{code}
 +- commons-lang:commons-lang:jar:2.6:compile
 +- com.google.guava:guava:jar:11.0.2:compile
 |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
 +- commons-logging:commons-logging:jar:1.1.3:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 \- com.google.protobuf:protobuf-java:jar:2.5.0:compile
{code}

Does this address your concerns?


was (Author: tucu00):
[~vinodvk], 

While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it 
drags non-required dependencies (unless you only put in non-leaf POMs 
dependencies that are common to all the leaf modules).

Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is 
one of the motivations (agree it is an IntelliJ issue, on the other hand the 
change does not affect the project built at all and allows IntelliJ users to 
build/debug from the IDE out of the box without doing funny voodoo).

The other motivation, and IMO is more important, is to clean up the 
dependencies modules like yarn-api and yarn-client have. Restricting them to 
what is used on the client side. Using the dependency:tree and 
dependency:analyze plugins I’ve reduced the 3rd party JARs required by the 
clients significantly. As [~ste...@apache.org] pointed out there is much more 
work we should do in this direction, this is a first non-intrusive baby step in 
that direction.

To give you and idea, before the this patch *hadoop-yarn-api* reports as 
required dependencies by itself:

{code}
 +- org.slf4j:slf4j-api:jar:1.7.5:compile
 +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
 |  \- log4j:log4j:jar:1.2.17:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 |  +- tomcat:jasper-compiler:jar:5.5.23:test
 +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
 +- io.netty:netty:jar:3.6.2.Final:compile
 +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
 +- commons-io:commons-io:jar:2.4:compile
 +- com.google.inject:guice:jar:3.0:compile
 |  +- javax.inject:javax.inject:jar:1:compile
 |  \- aopalliance:aopalliance:jar:1.0:compile
 +- 

[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-09 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866945#comment-13866945
 ] 

Alejandro Abdelnur commented on YARN-888:
-

You got it. On the hybrid approach, it is quite cumbersome as you would have to 
verify that all children modules use the common dependency to add. IMO, leaving 
the noleaf modules slim will be much easier to handle. Plus we solve the prob 
for IntelliJ IDE users.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-09 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867488#comment-13867488
 ] 

Alejandro Abdelnur commented on YARN-888:
-

[~vinodkv], thx for  taking the time to play with the patch.

bq. hadoop-yarn-project's pom.xml has some deps ...

This was an oversight from end as I've traversed the parent poms starting from 
the leafs and the yarn modules do not have hadoop-yarn-project as parent. This 
means the dependencies there were not being used. I'm attaching a new patch 
removing the dependencies section from the hadoop-yarn-project. Thanks for 
catching that.

bq. I guess this was you meant by 

Correct. However, I wouldn't say the plugin is broken, but it has limitations 
(it cannot detect usage of classes loaded via reflection, it cannot detect use 
of constant for primitive types and Strings, etc).

bq. We should set up a single node cluster atleast to ensure that all is well.

The produced TARBALL has the exact same set of JAR files, so I would not expect 
this being an issue.

However, just to be safe, I've just did a build with the patch, started 
minicluster and run a couple of example jobs.


 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-888) clean up POM dependencies

2014-01-08 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-888:


Attachment: YARN-888.patch

new patch adding missing hadoop-common test JARs in a few modules.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-08 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866017#comment-13866017
 ] 

Alejandro Abdelnur commented on YARN-888:
-

I've run the resourcemanager and nodemanager testcases locally with the patch 
applied and I did not get any failures.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-08 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866162#comment-13866162
 ] 

Alejandro Abdelnur commented on YARN-888:
-

Karthik, thanks for the verification.

Answering Steve, what you suggest it makes completely sense, but it is a much 
bigger undertaking. This JIRA is just cleaning-up/tweaking the current stuff 
without affecting the current end result. If you have time, would you give the 
latest patch a try?


 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-01-08 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866308#comment-13866308
 ] 

Alejandro Abdelnur edited comment on YARN-1577 at 1/9/14 4:34 AM:
--

this seems a serious regression, shouldn't this be a blocker for 2.4?


was (Author: tucu00):
this seems a serius regression, shouldn't this be a blocker for 2.4?

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-01-08 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866308#comment-13866308
 ] 

Alejandro Abdelnur commented on YARN-1577:
--

this seems a serius regression, shouldn't this be a blocker for 2.4?

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-888) clean up POM dependencies

2014-01-07 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864748#comment-13864748
 ] 

Alejandro Abdelnur commented on YARN-888:
-

[~rvs], mind if i take this JIRA from you? (besides being a cool JIRA number to 
own) the current dependencies in Yarn POMs are breaking intellij integration 
and this is kind of driving me crazy and I took a stub this morning and have a 
working patch.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik

 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-888) clean up POM dependencies

2014-01-07 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned YARN-888:
---

Assignee: Alejandro Abdelnur  (was: Roman Shaposhnik)

Thanks Roman, I'll be posting the patch momentarily. If you have time to review 
it, it would be great.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-888) clean up POM dependencies

2014-01-07 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-888:


Attachment: YARN-888.patch

The patch moves all the dependencies to the leaf projects declaring explicitly 
what the module needs (used the dependency:analyze plugin to zero on that, 
commented in the POMs the dependencies not caught by the plugin as used).

I've also did a DIST build and verified the JARs in the DIST are all the same 
(with the exception of the yarn-site JAR which is no more, the project for it 
is of type 'pom').

I've also verified Intellij now works fine compiling and running testcases.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1538) Support web-services for application-submission and AM protocols

2013-12-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857030#comment-13857030
 ] 

Alejandro Abdelnur commented on YARN-1538:
--

Is the idea to fully duplicate all AM facing RPC interfaces over HTTP? Having 
scheduling over HTTP but not the NM interface over HTTP does not buy much as 
you'll need to use RPC for starting the containers.

Also, how about security and token support over HTTP?

 Support web-services for application-submission and AM protocols
 

 Key: YARN-1538
 URL: https://issues.apache.org/jira/browse/YARN-1538
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Vinod Kumar Vavilapalli

 We already have read-only web-services for YARN - MAPREDUCE-2863. It'll be 
 great to support APIs for application submission and the scheduling protocol 
 - as alternatives to RPCs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2013-12-06 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1471:
-

Attachment: SLSCapacityScheduler.java

[~curino], how about the attached alternate impl for the 
SLSCapacityScheduler.java? it fully leverages the existing wrapper without 
duplicating code. You'll need to add a new constructor the the wrapper to take 
a ResourceScheduler as parameter.

The SchedulerWrapper interface should be name SLSSchedulerWrapper and the 
implementation SLSSchedulerWrapperImpl.

HTH

 The SLS simulator is not running the preemption policy for CapacityScheduler
 

 Key: YARN-1471
 URL: https://issues.apache.org/jira/browse/YARN-1471
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Carlo Curino
Assignee: Carlo Curino
Priority: Minor
 Attachments: SLSCapacityScheduler.java, YARN-1471.patch


 The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
 This is because the policy needs to interact with a CapacityScheduler, and 
 the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840977#comment-13840977
 ] 

Alejandro Abdelnur commented on YARN-1404:
--

[~vinodkv], thanks for summarizing our offline chat.

Regarding *ACLs and an on/off switch*:

IMO they are not necessary for the following reason.

You need an external system installed and running in the node to use the 
resources of an unmanaged container. If you have direct access into the node to 
start the external system, you are 'trusted'. If you don't have direct access 
you cannot use the resources of an unmanaged container.

I think this is a very strong requirement already and it would avoid adding a 
new ACL and an on/off switch.

Regarding *Liveliness*:

In the case of managed containers we don't have a liveliness 'report' and the 
container process could very well be hang. In such scenario is the 
responsibility of the AM to detected the liveliness of the container process 
and react if it is considered hung.

In the case of unmanaged containers, the AM would the same responsibility. 

The only difference is that in the case of managed containers if the process 
exits the NM detects that, while in the case of unmanaged containers this 
responsibility would fall on the AM.

Because of this I think we could do without a leaseRenewal/liveliness call.

Regarding *NM assume a whole lot of things about containers* 3 bullet items:

For the my current use case none if this is needed. It could be relatively easy 
to enable such functionality if a use case that needs it arises.

Regarding *Can such trusted application mix and match managed and unmanaged 
containers?*:

In the way I envision how this will work, when an AM asks for a container and 
gets an allocation for from the RM, the RM does not know if the AM will start a 
managed or an unmanaged container.  It is only between the AM and the NM that 
this is known, when the ContainerLaunchContext is NULL.

Regarding *YARN-1040 should enabled starting unmanaged containers*:

If YARN-1040 would be implemented, yes, it would enable unmanaged containers. 
However the scope of YARN-1040 is much bigger than unmanaged containers. 

It should be also be possible implementing unmanaged containers as being 
discussed and later implement YARN-1040.

Does this make sense?




 Enable external systems/frameworks to share resources with Hadoop leveraging 
 Yarn resource scheduling
 -

 Key: YARN-1404
 URL: https://issues.apache.org/jira/browse/YARN-1404
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-1404.patch


 Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
 applications run workload in. External frameworks/systems could benefit from 
 sharing resources with other Yarn applications while running their workload 
 within long-running processes owned by the external framework (in other 
 words, running their workload outside of the context of a Yarn container 
 process). 
 Because Yarn provides robust and scalable resource management, it is 
 desirable for some external systems to leverage the resource governance 
 capabilities of Yarn (queues, capacities, scheduling, access control) while 
 supplying their own resource enforcement.
 Impala is an example of such system. Impala uses Llama 
 (http://cloudera.github.io/llama/) to request resources from Yarn.
 Impala runs an impalad process in every node of the cluster, when a user 
 submits a query, the processing is broken into 'query fragments' which are 
 run in multiple impalad processes leveraging data locality (similar to 
 Map-Reduce Mappers processing a collocated HDFS block of input data).
 The execution of a 'query fragment' requires an amount of CPU and memory in 
 the impalad. As the impalad shares the host with other services (HDFS 
 DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
 (MapReduce tasks).
 To ensure cluster utilization that follow the Yarn scheduler policies and it 
 does not overload the cluster nodes, before running a 'query fragment' in a 
 node, Impala requests the required amount of CPU and memory from Yarn. Once 
 the requested CPU and memory has been allocated, Impala starts running the 
 'query fragment' taking care that the 'query fragment' does not use more 
 resources than the ones that have been allocated. Memory is book kept per 
 'query fragment' and the threads used for the processing of the 'query 
 fragment' are placed under a cgroup to contain CPU utilization.
 Today, for all resources that have been asked to Yarn RM, a (container) 
 process must be 

[jira] [Comment Edited] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-05 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840977#comment-13840977
 ] 

Alejandro Abdelnur edited comment on YARN-1404 at 12/6/13 4:47 AM:
---

[~vinodkv], thanks for summarizing our offline chat.

Regarding *ACLs and an on/off switch*:

IMO they are not necessary for the following reason.

You need an external system installed and running in the node to use the 
resources of an unmanaged container. If you have direct access into the node to 
start the external system, you are 'trusted'. If you don't have direct access 
you cannot use the resources of an unmanaged container.

I think this is a very strong requirement already and it would avoid adding 
code to manage a new ACL and an on/off switch.

Regarding *Liveliness*:

In the case of managed containers we don't have a liveliness 'report' and the 
container process could very well be hung. In such scenario is the 
responsibility of the AM to detected the liveliness of the container process 
and react if it is considered hung.

In the case of unmanaged containers, the AM would have the same responsibility. 

The only difference is that in the case of managed containers if the process 
exits the NM detects that, while in the case of unmanaged containers this 
responsibility would fall on the AM.

Because of this I think we could do without a leaseRenewal/liveliness call.

Regarding *NM assume a whole lot of things about containers* 3 bullet items:

For the my current use case none if this is needed. It could be relatively easy 
to enable such functionality if a use case that needs it arises.

Regarding *Can such trusted application mix and match managed and unmanaged 
containers?*:

In the way I envision how this would work, when an AM asks for a container and 
gets an allocation for from the RM, the RM does not know if the AM will start a 
managed or an unmanaged container.  It is only between the AM and the NM that 
this is known, when the ContainerLaunchContext is NULL.

Regarding *YARN-1040 should enabled starting unmanaged containers*:

If YARN-1040 would be implemented, yes, it would enable unmanaged containers. 
However the scope of YARN-1040 is much bigger than unmanaged containers. 

It should be also be possible implementing unmanaged containers as being 
discussed and later implement YARN-1040.

Does this make sense?





was (Author: tucu00):
[~vinodkv], thanks for summarizing our offline chat.

Regarding *ACLs and an on/off switch*:

IMO they are not necessary for the following reason.

You need an external system installed and running in the node to use the 
resources of an unmanaged container. If you have direct access into the node to 
start the external system, you are 'trusted'. If you don't have direct access 
you cannot use the resources of an unmanaged container.

I think this is a very strong requirement already and it would avoid adding a 
new ACL and an on/off switch.

Regarding *Liveliness*:

In the case of managed containers we don't have a liveliness 'report' and the 
container process could very well be hang. In such scenario is the 
responsibility of the AM to detected the liveliness of the container process 
and react if it is considered hung.

In the case of unmanaged containers, the AM would the same responsibility. 

The only difference is that in the case of managed containers if the process 
exits the NM detects that, while in the case of unmanaged containers this 
responsibility would fall on the AM.

Because of this I think we could do without a leaseRenewal/liveliness call.

Regarding *NM assume a whole lot of things about containers* 3 bullet items:

For the my current use case none if this is needed. It could be relatively easy 
to enable such functionality if a use case that needs it arises.

Regarding *Can such trusted application mix and match managed and unmanaged 
containers?*:

In the way I envision how this will work, when an AM asks for a container and 
gets an allocation for from the RM, the RM does not know if the AM will start a 
managed or an unmanaged container.  It is only between the AM and the NM that 
this is known, when the ContainerLaunchContext is NULL.

Regarding *YARN-1040 should enabled starting unmanaged containers*:

If YARN-1040 would be implemented, yes, it would enable unmanaged containers. 
However the scope of YARN-1040 is much bigger than unmanaged containers. 

It should be also be possible implementing unmanaged containers as being 
discussed and later implement YARN-1040.

Does this make sense?




 Enable external systems/frameworks to share resources with Hadoop leveraging 
 Yarn resource scheduling
 -

 Key: YARN-1404
 URL: https://issues.apache.org/jira/browse/YARN-1404
 

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2013-12-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839215#comment-13839215
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

We should HTML encode tags if presenting them on an HTML page.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler

2013-12-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839299#comment-13839299
 ] 

Alejandro Abdelnur commented on YARN-1403:
--

* AllocationConfiguration.java
** Do we need the 2 constructors? The one receiving the config is a bit 
misleading as sets all default values except for placement policy
** Most getter methods that are querying maps do 2 lookups, containsKey() then 
get(),  we could just do a single get() and if NULL return the default value. 
ie:
{code}
  public int getQueueMaxApps(String queue) {
Integer max = queueMaxApps.get(queue);
return (max != null) ? max : queueMaxAppsDefault;
  }
{code}
** Most getter methods are doing an if/else block to return default values, 
using (v!=null)? v : DEFAULT' would be shorter/simpler.
** getMaxResources(String queueName) should use a constant MAX RESOURCE instead 
creating one over an over
** hasAccess() 'lastPeriodIndex-1', space before/after '-'
* AllocationFileLoaderService.java
** the time value constants should indicate in comments the time out (ms I 
assume)
** start() Thread run() loop, what happens if somebody deletes the file in 
long lastModified = allocFile.lastModified(); shouldn't we check for exists() 
before attempting reload detection? and warn if not there anymore.
** move 'ReloadListener' interface to the top of the class. Also, as it is an 
inner interface, it can simply be named 'Listener' and the method of it 
'onReload()'
* FairSCheudlerConfiguration.java
** false change line 242
* FSQueue.java
** If the FSQueue receives a QueueManager, and the QueueManager provides the 
AllocationConfiguration (which is updated), then the changes are much less and 
we don't have to iterate over all FSQueue instances to set an updated 
AllocationConfiguration.
* QueueManager.java
** the 'updateQueuesWithReloadedConf()' method should receive the 
AllocationConfiguration as parameter, it would be more obvious instead getting 
it from the scheduler. Also, the name of the method should be 
'updateConfiguration()'


 Separate out configuration loading from QueueManager in the Fair Scheduler
 --

 Key: YARN-1403
 URL: https://issues.apache.org/jira/browse/YARN-1403
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403-3.patch, 
 YARN-1403.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2013-12-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839531#comment-13839531
 ] 

Alejandro Abdelnur commented on YARN-1471:
--

As the ProportionalCapacityPreemptionPolicy is CS specific, why does not live 
within the CS itself? Shouldn't be that way?

 The SLS simulator is not running the preemption policy for CapacityScheduler
 

 Key: YARN-1471
 URL: https://issues.apache.org/jira/browse/YARN-1471
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Carlo Curino
Assignee: Wei Yan
Priority: Minor

 The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
 This is because the policy needs to interact with a CapacityScheduler, and 
 the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler

2013-12-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839644#comment-13839644
 ] 

Alejandro Abdelnur commented on YARN-1403:
--

* FairScheduler.java
** Given that the AllocationFileLoaderService is a service, shouldn'd its 
lifecycle start/stop be managed as other services? here reinitialize() is doing 
start and there is no stop. this is a bit off, no?
* FairSchedulerQueueInfo.java
** in case of reload, the maxApps var here never gets updated, shouldn't the 
getMaxApplications() method be as follows so the max is properly refreshed?

{code}
  public int getMaxApplications() {
return scheduler.getAllocationConfiguration().getQueueMaxApps(queueName);
  }
{code}


 Separate out configuration loading from QueueManager in the Fair Scheduler
 --

 Key: YARN-1403
 URL: https://issues.apache.org/jira/browse/YARN-1403
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403-3.patch, 
 YARN-1403-4.patch, YARN-1403-5.patch, YARN-1403.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler

2013-12-04 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839679#comment-13839679
 ] 

Alejandro Abdelnur commented on YARN-1403:
--

+1

 Separate out configuration loading from QueueManager in the Fair Scheduler
 --

 Key: YARN-1403
 URL: https://issues.apache.org/jira/browse/YARN-1403
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403-3.patch, 
 YARN-1403-4.patch, YARN-1403-5.patch, YARN-1403.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2013-12-03 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838579#comment-13838579
 ] 

Alejandro Abdelnur commented on YARN-1471:
--

[~curino], I'm not familiar on how the ProportionalCapacityPreemptionPolicy 
gets hold of data from the CapacityScheduler. The SLS simply wraps the 
Scheduler implementation using a proxy pattern. Thus, the scheduler API is 
fully exposed even if wrapped. If the ProportionalCapacityPreemptionPolicy cast 
the Scheduler interface to the CapacityScheduler class, I would be inclined to 
say that this is not a SLS issue but a ProportionalCapacityPreemptionPolicy 
issue. Are the methods of the CapacityScheduler used by the 
ProportionalCapacityPreemptionPolicy general purpose to qualify being in the 
Scheduler API? If not, how about implementing a 'safety valve', something like 
{{public T T getComponent(ClassT klass)}} in the Scheduler API, and the 
contract is to return NULL if such component is not implemented by the 
scheduler. Would something like this work?

 The SLS simulator is not running the preemption policy for CapacityScheduler
 

 Key: YARN-1471
 URL: https://issues.apache.org/jira/browse/YARN-1471
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Carlo Curino
Assignee: Wei Yan
Priority: Minor

 The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
 This is because the policy needs to interact with a CapacityScheduler, and 
 the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1390) Provide a way to capture source of an application to be queried through REST or Java Client APIs

2013-12-02 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836723#comment-13836723
 ] 

Alejandro Abdelnur commented on YARN-1390:
--

Agree with Steve, we should limit the length of a tag and number of tags. I'd 
suggest going hardcoded for now, i.e. 50chars/10tags and going configurable 
later if the need arises.

 Provide a way to capture source of an application to be queried through REST 
 or Java Client APIs
 

 Key: YARN-1390
 URL: https://issues.apache.org/jira/browse/YARN-1390
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla

 In addition to other fields like application-type (added in YARN-563), it is 
 useful to have an applicationSource field to track the source of an 
 application. The application source can be useful in (1) fetching only those 
 applications a user is interested in, (2) potentially adding source-specific 
 optimizations in the future. 
 Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
 etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2013-12-02 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836980#comment-13836980
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

What is the concern for a tag being a valid unicode string? If queried via rest 
API the values would be urlencoded, thus no harm.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags

2013-12-02 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837031#comment-13837031
 ] 

Alejandro Abdelnur commented on YARN-1399:
--

I would stick to exact tag match. Case insensitive seems reasonable, though I 
would implement it by lowercase or upper case tags on arrival and when 
querying. Then the matching is the cheapest.

Regarding symbols, what is the harm in supporting them?

One thing we didn't mentioned before, on querying I would support only OR, then 
the client must do any further filtering if it wants to do AND.

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1456) IntelliJ IDEA gets dependencies wrong for hadoop-yarn-server-resourcemanager

2013-12-01 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13836086#comment-13836086
 ] 

Alejandro Abdelnur commented on YARN-1456:
--

This seems a dup of YARN-888 and MAPREDUCE-5362 (same issue for MR modules)

 IntelliJ IDEA gets dependencies wrong for  hadoop-yarn-server-resourcemanager
 -

 Key: YARN-1456
 URL: https://issues.apache.org/jira/browse/YARN-1456
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
 Environment: IntelliJ IDEA 12.x  13.x beta
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: YARN-1456-001.patch


 When IntelliJ IDEA imports the hadoop POMs into the IDE, somehow it fails to 
 pick up all the transitive dependencies of the yarn-client, and so can't 
 resolve commons logging, com.google.* classes and the like.
 While this is probably an IDEA bug, it does stop you building Hadoop from 
 inside the IDE, making debugging significantly harder



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1442) change yarn minicluster base directory via system property

2013-11-26 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832716#comment-13832716
 ] 

Alejandro Abdelnur commented on YARN-1442:
--

Instead introducing a new system property yarn.minicluster.directory we 
should have a common hadoop.test.dir used by all minicluster and tests, if 
different components need their own subdir, they can created it under it. Else 
we need to chase all properties to set.

 change yarn minicluster base directory via system property
 --

 Key: YARN-1442
 URL: https://issues.apache.org/jira/browse/YARN-1442
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: André Kelpe
Priority: Minor
 Attachments: HADOOP-10122.patch


 The yarn minicluster used for testing uses the target directory by default. 
 We use gradle for building our projects and we would like to see it using a 
 different directory. This patch makes it possible to use a different 
 directory by setting the yarn.minicluster.directory system property.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-951) Add hard minimum resource capabilities for container launching

2013-11-21 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-951.
-

Resolution: Won't Fix

 Add hard minimum resource capabilities for container launching
 --

 Key: YARN-951
 URL: https://issues.apache.org/jira/browse/YARN-951
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Wei Yan

 This is a follow up of YARN-789, which enabled FairScheduler to handle zero 
 capabilities resource requests in one dimension (either zero CPU or zero 
 memory).
 When resource enforcement is enabled (cgroups for CPU and 
 ProcfsBasedProcessTree for memory) we cannot use zero because the underlying 
 container processes will be killed.
 We need to introduce an absolute or hard minimum:
 * For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an 
 absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we 
 ensure there is enough CPU cycles to run the sleep process. This absolute 
 minimum would only kick-in if zero is allowed, otherwise will never kick in 
 as the shares for 1 CPU are 1024.
 * For Memory. Hard enforcement is currently done by the 
 ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would 
 take care of zero memory resources. And again, this absolute minimum would 
 only kick-in if zero is allowed, otherwise will never kick in as the 
 increment memory is in several MBs if not 1GB.
 There would be no default for this hard minimum, if not set no correction 
 will be done. If set, then the MAX(hard-minimum, 
 container-resource-capability) will be used. 
 Effectively there will not be any impact unless the hard minimum capabilities 
 are explicitly set.
 And, even if set, unless the scheduler is configured to allow zero 
 capabilities, the hard-minimum value will not kick in unless is set to a 
 value higher than the MIN capabilities for a container.
 Expected values, when set, would be 10 shares for CPU and 2 MB for memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-11-21 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1404:
-

Description: 
Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
applications run workload in. External frameworks/systems could benefit from 
sharing resources with other Yarn applications while running their workload 
within long-running processes owned by the external framework (in other words, 
running their workload outside of the context of a Yarn container process). 

Because Yarn provides robust and scalable resource management, it is desirable 
for some external systems to leverage the resource governance capabilities of 
Yarn (queues, capacities, scheduling, access control) while supplying their own 
resource enforcement.

Impala is an example of such system. Impala uses Llama 
(http://cloudera.github.io/llama/) to request resources from Yarn.

Impala runs an impalad process in every node of the cluster, when a user 
submits a query, the processing is broken into 'query fragments' which are run 
in multiple impalad processes leveraging data locality (similar to Map-Reduce 
Mappers processing a collocated HDFS block of input data).

The execution of a 'query fragment' requires an amount of CPU and memory in the 
impalad. As the impalad shares the host with other services (HDFS DataNode, 
Yarn NodeManager, Hbase Region Server) and Yarn Applications (MapReduce tasks).

To ensure cluster utilization that follow the Yarn scheduler policies and it 
does not overload the cluster nodes, before running a 'query fragment' in a 
node, Impala requests the required amount of CPU and memory from Yarn. Once the 
requested CPU and memory has been allocated, Impala starts running the 'query 
fragment' taking care that the 'query fragment' does not use more resources 
than the ones that have been allocated. Memory is book kept per 'query 
fragment' and the threads used for the processing of the 'query fragment' are 
placed under a cgroup to contain CPU utilization.

Today, for all resources that have been asked to Yarn RM, a (container) process 
must be started via the corresponding NodeManager. Failing to do this, will 
result on the cancelation of the container allocation relinquishing the 
acquired resource capacity back to the pool of available resources. To avoid 
this, Impala starts a dummy container process doing 'sleep 10y'.

Using a dummy container process has its drawbacks:

* the dummy container process is in a cgroup with a given number of CPU shares 
that are not used and Impala is re-issuing those CPU shares to another cgroup 
for the thread running the 'query fragment'. The cgroup CPU enforcement works 
correctly because of the CPU controller implementation (but the formal 
specified behavior is actually undefined).
* Impala may ask for CPU and memory independent of each other. Some requests 
may be only memory with no CPU or viceversa. Because a container requires a 
process, complete absence of memory or CPU is not possible even if the dummy 
process is 'sleep', a minimal amount of memory and CPU is required for the 
dummy process.

Because of this it is desirable to be able to have a container without a 
backing process.

  was:
Currently a container allocation requires to start a container process with the 
corresponding NodeManager's node.

For applications that need to use the allocated resources out of band from Yarn 
this means that a dummy container process must be started.

Impala/Llama is an example of such application which is currently starting a 
'sleep 10y' (10 years) process as the container process. And the resource 
capabilities are used out of by and the Impala process collocated in the node. 
The Impala process ensures the processing associated to that resources do not 
exceed the capabilities of the container. Also, if the container is 
lost/preempted/killed, Impala stops using the corresponding resources.

In addition, in the case of Llama, the current requirement of having a 
container process, gets complicates when hard resource enforcement (memory 
-ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama 
request resources with CPU and memory independently of each other. Some 
requests are CPU only and others are memory only. Unmanaged containers solve 
this problem as there is no underlying process with zero CPU or zero memory.



Summary: Enable external systems/frameworks to share resources with 
Hadoop leveraging Yarn resource scheduling  (was: Add support for unmanaged 
containers)

Updated the summary and the description to better describe the use case driving 
this JIRA.

I've closed YARN-951 as won't fix as it is a workaround of the problem this 
JIRA is trying to address.

I don't think there is a need for an umbrella JIRA as this is the only change 
we need.


 Enable external systems/frameworks to share 

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-11-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829640#comment-13829640
 ] 

Alejandro Abdelnur commented on YARN-1404:
--

The proposal to address this JIRA is:

* Allow a NULL ContainerLaunchContext in the startContainer() call, this 
signals the is not process to be started with the container.
* The ContainerLaunch logic would use a latch to lock when there is not 
associated process. The latch will be released on container completion 
(preemption or terminated by the AM)

The changes to achieve this are minimal and they do not alter at all the 
lifecycle of a container, nor in the RM, nor in the NM.

As previously mentioned by Bikas, this can be seen as a special case of the 
functionality that YARN-1040 is proposing for managing multiple processes with 
the same container. 

The scope of work of YARN-1040 is significantly larger and requires API 
changes, while this JIRA does not require API changes and the changes are not 
incompatible with each other.



 Enable external systems/frameworks to share resources with Hadoop leveraging 
 Yarn resource scheduling
 -

 Key: YARN-1404
 URL: https://issues.apache.org/jira/browse/YARN-1404
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-1404.patch


 Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
 applications run workload in. External frameworks/systems could benefit from 
 sharing resources with other Yarn applications while running their workload 
 within long-running processes owned by the external framework (in other 
 words, running their workload outside of the context of a Yarn container 
 process). 
 Because Yarn provides robust and scalable resource management, it is 
 desirable for some external systems to leverage the resource governance 
 capabilities of Yarn (queues, capacities, scheduling, access control) while 
 supplying their own resource enforcement.
 Impala is an example of such system. Impala uses Llama 
 (http://cloudera.github.io/llama/) to request resources from Yarn.
 Impala runs an impalad process in every node of the cluster, when a user 
 submits a query, the processing is broken into 'query fragments' which are 
 run in multiple impalad processes leveraging data locality (similar to 
 Map-Reduce Mappers processing a collocated HDFS block of input data).
 The execution of a 'query fragment' requires an amount of CPU and memory in 
 the impalad. As the impalad shares the host with other services (HDFS 
 DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
 (MapReduce tasks).
 To ensure cluster utilization that follow the Yarn scheduler policies and it 
 does not overload the cluster nodes, before running a 'query fragment' in a 
 node, Impala requests the required amount of CPU and memory from Yarn. Once 
 the requested CPU and memory has been allocated, Impala starts running the 
 'query fragment' taking care that the 'query fragment' does not use more 
 resources than the ones that have been allocated. Memory is book kept per 
 'query fragment' and the threads used for the processing of the 'query 
 fragment' are placed under a cgroup to contain CPU utilization.
 Today, for all resources that have been asked to Yarn RM, a (container) 
 process must be started via the corresponding NodeManager. Failing to do 
 this, will result on the cancelation of the container allocation 
 relinquishing the acquired resource capacity back to the pool of available 
 resources. To avoid this, Impala starts a dummy container process doing 
 'sleep 10y'.
 Using a dummy container process has its drawbacks:
 * the dummy container process is in a cgroup with a given number of CPU 
 shares that are not used and Impala is re-issuing those CPU shares to another 
 cgroup for the thread running the 'query fragment'. The cgroup CPU 
 enforcement works correctly because of the CPU controller implementation (but 
 the formal specified behavior is actually undefined).
 * Impala may ask for CPU and memory independent of each other. Some requests 
 may be only memory with no CPU or viceversa. Because a container requires a 
 process, complete absence of memory or CPU is not possible even if the dummy 
 process is 'sleep', a minimal amount of memory and CPU is required for the 
 dummy process.
 Because of this it is desirable to be able to have a container without a 
 backing process.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1407) RM Web UI and REST APIs should uniformly use YarnApplicationState

2013-11-19 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827164#comment-13827164
 ] 

Alejandro Abdelnur commented on YARN-1407:
--

LGTM, +1

 RM Web UI and REST APIs should uniformly use YarnApplicationState
 -

 Key: YARN-1407
 URL: https://issues.apache.org/jira/browse/YARN-1407
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1407-1.patch, YARN-1407-2.patch, YARN-1407.patch


 RMAppState isn't a public facing enum like YarnApplicationState, so we 
 shouldn't return values or list filters that come from it. However, some 
 Blocks and AppInfo are still using RMAppState.
 It is not 100% clear to me whether or not fixing this would be a 
 backwards-incompatible change.  The change would only reduce the set of 
 possible strings that the API returns, so I think not.  We have also been 
 changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would 
 still be good to fix this ASAP (i.e. for 2.2.1).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler

2013-11-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824268#comment-13824268
 ] 

Alejandro Abdelnur commented on YARN-1403:
--

* AllocationFileLoaderService.java:
** start()/stop() should set a volatile boolean 'running' to true/false  the 
reloadThread should loop while 'running'. The stop() should interrupt the 
thread for force a wake up if sleeping.
** reloadThread run(), the try block should include the reload, then when 
interrupted by stop() would skip the reloading if exiting.
** reloadAllocs(), we are not charge by the character, method name should be 
reloadAllocations()
** if (allocFile == null) return; use {}
** what happens if reloadListener.queueConfigurationReloaded(info); throws an 
exception? in what state things end up?
** not sure the logic using lastReloadAttemptFailed is correct, in the 
exception handling in thread run()
* QueueConfiguration.java
** QueueConfiguration() constructor, shouldn't placementpolicy be the default?
* QueueManager.java
** shouldn't this be a composite service?
** it is starting but not stopping the AllocationFileLoaderService
** the initialize() setting the reload-listener is too hidden, this should be 
done next to where the AllocationFileService is created.

Wouldn't be simpler/cleaner if the QueueManager should be a service that 
encapsulates the reloading, queue allocations, ACLs and queue placement. And 
the FS should just see methods of it.


 Separate out configuration loading from QueueManager in the Fair Scheduler
 --

 Key: YARN-1403
 URL: https://issues.apache.org/jira/browse/YARN-1403
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1403-1.patch, YARN-1403.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1392) Allow sophisticated app-to-queue placement policies in the Fair Scheduler

2013-11-14 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13822939#comment-13822939
 ] 

Alejandro Abdelnur commented on YARN-1392:
--

LGTM +1. when committing, please remove 2 false changes in the 
FairScheduler.java lines 696/1173 (empty lines)


 Allow sophisticated app-to-queue placement policies in the Fair Scheduler
 -

 Key: YARN-1392
 URL: https://issues.apache.org/jira/browse/YARN-1392
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1392-1.patch, YARN-1392-1.patch, YARN-1392-2.patch, 
 YARN-1392-3.patch, YARN-1392.patch


 Currently the Fair Scheduler supports app-to-queue placement by username.  It 
 would be beneficial to allow more sophisticated policies that rely on primary 
 and secondary groups and fallbacks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2013-11-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821597#comment-13821597
 ] 

Alejandro Abdelnur commented on YARN-1040:
--

[~bikassaha], if I got it right, you suggest:

* {{StartContainerRequest}} would have a new property {{boolean 
multipleProcesses (false)}}
* An additional API {{startProcess(ContainerId, ContainerLaunchContext)}} will 
be used to start multiple processes within the same container.
* In a {{StartContainerRequest}}, if the {{ContainerLaunchContext == null}} and 
{{multipleProcesses = true}}, the container is started with no associated 
process and the container allocation will not timeout as it as been claimed by 
the AM (because of the start container request).

If that is the case, then YARN-1404 would be a special case of this JIRA.

Am i right?

 De-link container life cycle from the process and add ability to execute 
 multiple processes in the same long-lived container
 

 Key: YARN-1040
 URL: https://issues.apache.org/jira/browse/YARN-1040
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran

 The AM should be able to exec 1 process in a container, rather than have the 
 NM automatically release the container when the single process exits.
 This would let an AM restart a process on the same container repeatedly, 
 which for HBase would offer locality on a restarted region server.
 We may also want the ability to exec multiple processes in parallel, so that 
 something could be run in the container while a long-lived process was 
 already running. This can be useful in monitoring and reconfiguring the 
 long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1404) Add support for unmanaged containers

2013-11-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821657#comment-13821657
 ] 

Alejandro Abdelnur commented on YARN-1404:
--

[~ste...@apache.org]

bq. 1. I'd be inclined to treat this as a special case of YARN-1040

I've just commented in YARN-1040 following Bikas' comment on this 
https://issues.apache.org/jira/browse/YARN-1040?focusedCommentId=13821597page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13821597

bq. It's dangerously easy to leak containers here; I know llama keeps an eye on 
things, but I worry about other people's code -though admittedly, any 
long-lived command line app yes could do the same.

We can have NM configs to disable no-process or multi-process, but still you 
can workaround this around by having a dummy process. This is how Llama is 
doing things today, but it is not ideal for several reasons.

IMO, from Yarn perspective we need to allow AMs to be able to do sophisticated 
things within the Yarn programming model (like you are trying to do with 
long-lived containers or what I'm doing with Llama).

bq. For the multi-process (and that includes processes=0), we really do need 
some kind of lease renewal option to stop containers being retained forever. It 
would become the job of the AM to do the renewal

As I've mentioned above, I don't think we need a special lease for this: 
https://issues.apache.org/jira/browse/YARN-1404?focusedCommentId=13820200page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13820200
 (look for 'The reason I've taken the approach of leaving the container leases 
out of band is:')

[~vinodkv]

bq. -1 for this...

I think you are jumping too fast here.

bq. As I repeated on other JIRAs, please change the title with the problem 
statement instead of solutions.

IMO that makes completely sense for bugs, for improvements/new-features a 
description of it communicates more as it will be the commit message. The 
shortcomings the JIRA is trying to address should be captured in the 
description.

Take for example the following JIRA summaries, would you change them to 
describe a problem?

* AHS should support application-acls and queue-acls
* AM's tracking URL should be a URL instead of a string
* Add support for zipping/unzipping logs while in transit for the NM logs 
web-service
* YARN should have a ClusterId/ServiceId

bq. I indicated offline about llama with others. I don't think you need 
NodeManagers either to do what you want, forget about containers. All you need 
is use the ResourceManager/scheduler in isolation using MockRM/LightWeightRM 
(YARN-1385) - your need seems to be using the scheduling logic in YARN and not 
use the physical resources.

The whole point of Llama is to allow Impala to share resources in a real Yarn 
cluster doing other workloads like Map-Reduce. In other words, Impala/Llama and 
other AMs must share cluster resources. 


 Add support for unmanaged containers
 

 Key: YARN-1404
 URL: https://issues.apache.org/jira/browse/YARN-1404
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-1404.patch


 Currently a container allocation requires to start a container process with 
 the corresponding NodeManager's node.
 For applications that need to use the allocated resources out of band from 
 Yarn this means that a dummy container process must be started.
 Impala/Llama is an example of such application which is currently starting a 
 'sleep 10y' (10 years) process as the container process. And the resource 
 capabilities are used out of by and the Impala process collocated in the 
 node. The Impala process ensures the processing associated to that resources 
 do not exceed the capabilities of the container. Also, if the container is 
 lost/preempted/killed, Impala stops using the corresponding resources.
 In addition, in the case of Llama, the current requirement of having a 
 container process, gets complicates when hard resource enforcement (memory 
 -ContainersMonitor- or cpu -via cgroups-) is enabled because Impala/Llama 
 request resources with CPU and memory independently of each other. Some 
 requests are CPU only and others are memory only. Unmanaged containers solve 
 this problem as there is no underlying process with zero CPU or zero memory.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1407) apps REST API filters queries by YarnApplicationState, but includes RMAppStates in response

2013-11-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821692#comment-13821692
 ] 

Alejandro Abdelnur commented on YARN-1407:
--

Agree with [~tgraves], +1 after doc additions.

 apps REST API filters queries by YarnApplicationState, but includes 
 RMAppStates in response
 ---

 Key: YARN-1407
 URL: https://issues.apache.org/jira/browse/YARN-1407
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1407.patch


 RMAppState isn't a public facing enum like YarnApplicationState, so we 
 shouldn't return values that come from it.
 It is not 100% clear to me whether or not fixing this would be a 
 backwards-incompatible change.  The change would only reduce the set of 
 possible strings that the API returns, so I think not.  We have also been 
 changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would 
 still be good to fix this ASAP (i.e. for 2.2.1).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1241) In Fair Scheduler maxRunningApps does not work for non-leaf queues

2013-11-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821695#comment-13821695
 ] 

Alejandro Abdelnur commented on YARN-1241:
--

patch needs to be rebased, it does not apply.

 In Fair Scheduler maxRunningApps does not work for non-leaf queues
 --

 Key: YARN-1241
 URL: https://issues.apache.org/jira/browse/YARN-1241
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1241-1.patch, YARN-1241-2.patch, YARN-1241-3.patch, 
 YARN-1241-4.patch, YARN-1241-5.patch, YARN-1241-6.patch, YARN-1241.patch


 Setting the maxRunningApps property on a parent queue should make it that the 
 sum of apps in all subqueues can't exceed it



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   3   4   5   6   >