[jira] [Updated] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-11 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-8275:

Priority: Major  (was: Minor)

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8216) Reduce RegistryDNS port ping logging

2018-04-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved YARN-8216.
-
Resolution: Duplicate
  Assignee: (was: Eric Yang)

Resolving as duplicate, per issue link.

> Reduce RegistryDNS port ping logging
> 
>
> Key: YARN-8216
> URL: https://issues.apache.org/jira/browse/YARN-8216
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Priority: Major
>
> System monitoring software usually send a tcp packet to test if port is 
> alive.  This can cause RegistryDNS to throw BufferUnderflowException.
> {code}
> 2018-04-26 17:07:55,846 WARN 
> org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when 
> running task in RegistryDNS 3
> 2018-04-26 17:07:55,847 WARN 
> org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread 
> RegistryDNS 3:
> java.nio.BufferUnderflowException
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:771)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This is perfectly normal, but it would be nice to hide this error message to 
> reduce verbose logging on port ping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8216) Reduce RegistryDNS port ping logging

2018-04-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas reopened YARN-8216:
-

> Reduce RegistryDNS port ping logging
> 
>
> Key: YARN-8216
> URL: https://issues.apache.org/jira/browse/YARN-8216
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> System monitoring software usually send a tcp packet to test if port is 
> alive.  This can cause RegistryDNS to throw BufferUnderflowException.
> {code}
> 2018-04-26 17:07:55,846 WARN 
> org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when 
> running task in RegistryDNS 3
> 2018-04-26 17:07:55,847 WARN 
> org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread 
> RegistryDNS 3:
> java.nio.BufferUnderflowException
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:771)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This is perfectly normal, but it would be nice to hide this error message to 
> reduce verbose logging on port ping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8216) Reduce RegistryDNS port ping logging

2018-04-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16455593#comment-16455593
 ] 

Chris Douglas commented on YARN-8216:
-

Why was this issue created and resolved as fixed, with no patch, within 3 
minutes?

> Reduce RegistryDNS port ping logging
> 
>
> Key: YARN-8216
> URL: https://issues.apache.org/jira/browse/YARN-8216
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>
> System monitoring software usually send a tcp packet to test if port is 
> alive.  This can cause RegistryDNS to throw BufferUnderflowException.
> {code}
> 2018-04-26 17:07:55,846 WARN 
> org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when 
> running task in RegistryDNS 3
> 2018-04-26 17:07:55,847 WARN 
> org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread 
> RegistryDNS 3:
> java.nio.BufferUnderflowException
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:771)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846)
> at 
> org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This is perfectly normal, but it would be nice to hide this error message to 
> reduce verbose logging on port ping.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449023#comment-16449023
 ] 

Chris Douglas commented on YARN-8200:
-

bq. I would suggest to try use 3.x instead back porting this to 2.x so 
everybody is on the same codebase and improvement it. To me, the effort of 
backporting YARN-3926 + YARN-6223 will be comparable to upgrading a 3.x release 
and fixing (incompatible) issues
>From [~jhung]'s analysis, the backports were relatively straightforward 
>(mostly new code). Keeping it in sync with fixes/improvements in 3.x will 
>require ongoing maintenance, which is unfortunate. Are there specific areas 
>where you suspect the backport could become difficult to maintain?

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2018-03-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396339#comment-16396339
 ] 

Chris Douglas commented on YARN-3409:
-

bq. Me and Sunil G tried to delete it but permissions were not there so were 
trying to get that done with Jian he and Others and in the mean while you 
helped us out. Delete of a branch could not be done by all ?
I don't/shouldn't have any special privileges. Probably a change to the set of 
protected branches between when you tried and today.

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, client, RM
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2018-03-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395767#comment-16395767
 ] 

Chris Douglas commented on YARN-3409:
-

Deleted the {{yarn-3409}} branch, because it collides with {{YARN-3409}} on 
case-insensitive systems. The former looked like an accidental push.

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, client, RM
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2018-02-05 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352774#comment-16352774
 ] 

Chris Douglas commented on YARN-6868:
-

Verified that the zookeeper and curator test jars aren't part of the package 
after backporting, pushed.

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Major
> Fix For: 2.9.1
>
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2018-02-05 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved YARN-6868.
-
   Resolution: Fixed
Fix Version/s: (was: 3.0.0-beta1)
   2.9.1

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Major
> Fix For: 2.9.1
>
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2018-02-05 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas reopened YARN-6868:
-

Sure. Reopening to cherry-pick this to branch-2 and branch-2.9

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Major
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325464#comment-16325464
 ] 

Chris Douglas commented on YARN-7712:
-

bq. I need to use webhdfs to get the timestamp and set for each localized file 
every time I launch something. This is cumbersome and not necessary in case of 
my app
Perhaps, but YARN doesn't have anything else for correctness. If you're 
convinced this is necessary, please ensure that the NM verifies that the 
timestamp for a cached dependency matches the remote, before it returns it to 
the client (so if it's changed, the app gets the new version, never the cached 
version). To be consistent, you may also want to add similar semantics for size.

> Add ability to ignore timestamps in localized files
> ---
>
> Key: YARN-7712
> URL: https://issues.apache.org/jira/browse/YARN-7712
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>
> YARN currently requires and checks the timestamp of localized files and 
> fails, if the file on HDFS does not match to the one requested. This jira 
> adds the ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324620#comment-16324620
 ] 

Chris Douglas commented on YARN-7712:
-

Got it. The purpose of the timestamp is not security, but correctness. It does 
not support applications that might specify a region of a dependency (e.g., 
download a segment of a log file being appended to) or on a dependency that 
does not exist during submission. It is sufficient for static dependencies 
(e.g., jars) that are uploaded prior to submission, and to avoid the NM linking 
a stale version of a resource for a new container. The only security guarantees 
come from the {{FileSystem}}.

You mentioned the REST APIs a couple times. Why are those problematic?

If this is purely for testing, one could use a {{FilterFileSystem}} that 
returns a constant for the modification time, rather than modifying YARN...

> Add ability to ignore timestamps in localized files
> ---
>
> Key: YARN-7712
> URL: https://issues.apache.org/jira/browse/YARN-7712
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>
> YARN currently requires and checks the timestamp of localized files and 
> fails, if the file on HDFS does not match to the one requested. This jira 
> adds the ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319359#comment-16319359
 ] 

Chris Douglas commented on YARN-7712:
-

bq. I would like to keep this jira simple, all it is about is to ignore the 
timestamp check on downloads
Is the intent to accommodate a) modifications to files or b) completely 
different files- or files that don't exist during submission- as dependencies? 
What problem is this solving?

Ignoring the timestamp makes localization non-deterministic. A reexecution of a 
task could download and use a different dependency. Speculatively executed 
tasks could use different dependencies, depending on which machine they run on. 
It's a rare user who can safely disable this check in YARN, but can't work 
around the timestamp check...

> Add ability to ignore timestamps in localized files
> ---
>
> Key: YARN-7712
> URL: https://issues.apache.org/jira/browse/YARN-7712
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>
> YARN currently requires and checks the timestamp of localized files and 
> fails, if the file on HDFS does not match to the one requested. This jira 
> adds the ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317154#comment-16317154
 ] 

Chris Douglas commented on YARN-7712:
-

As [~ste...@apache.org] 
[suggested|https://issues.apache.org/jira/browse/HDFS-7878?focusedCommentId=15512866=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15512866],
 we could also use the {{PathHandle}} API for YARN dependencies.

> Add ability to ignore timestamps in localized files
> ---
>
> Key: YARN-7712
> URL: https://issues.apache.org/jira/browse/YARN-7712
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>
> YARN currently requires and checks the timestamp of localized files and 
> fails, if the file on HDFS does not match to the one requested. This jira 
> adds the ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7221) Add security check for privileged docker container

2017-09-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173864#comment-16173864
 ] 

Chris Douglas commented on YARN-7221:
-

Is this a duplicate of YARN-6623? Or is it an extension to permit privileged 
containers after passing additional security checks?

> Add security check for privileged docker container
> --
>
> Key: YARN-7221
> URL: https://issues.apache.org/jira/browse/YARN-7221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>
> When a docker is running with privileges, majority of the use case is to have 
> some program running with root then drop privileges to another user.  i.e. 
> httpd to start with privileged and bind to port 80, then drop privileges to 
> www user.  
> # We should add security check for submitting users, to verify they have 
> "sudo" access to run privileged container.  
> # We should remove --user=uid:gid for privileged containers.  
>  
> Docker can be launched with --privileged=true, and --user=uid:gid flag.  With 
> this parameter combinations, user will not have access to become root user.  
> All docker exec command will be drop to uid:gid user to run instead of 
> granting privileges.  User can gain root privileges if container file system 
> contains files that give user extra power, but this type of image is 
> considered as dangerous.  Non-privileged user can launch container with 
> special bits to acquire same level of root power.  Hence, we lose control of 
> which image should be run with --privileges, and who have sudo rights to use 
> privileged container images.  As the result, we should check for sudo access 
> then decide to parameterize --privileged=true OR --user=uid:gid.  This will 
> avoid leading developer down the wrong path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6622) Document Docker work as experimental

2017-09-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162173#comment-16162173
 ] 

Chris Douglas commented on YARN-6622:
-

Sure, it's better than nothing. Thanks, [~templedf].

> Document Docker work as experimental
> 
>
> Key: YARN-6622
> URL: https://issues.apache.org/jira/browse/YARN-6622
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6622.001.patch
>
>
> We should update the Docker support documentation calling out the Docker work 
> as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6622) Document Docker work as experimental

2017-09-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162090#comment-16162090
 ] 

Chris Douglas commented on YARN-6622:
-

Then let's backport YARN-5258. Enabling docker support in branch-2 effectively 
gives any user the capability to run processes as root on cluster machines.

> Document Docker work as experimental
> 
>
> Key: YARN-6622
> URL: https://issues.apache.org/jira/browse/YARN-6622
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6622.001.patch
>
>
> We should update the Docker support documentation calling out the Docker work 
> as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6622) Document Docker work as experimental

2017-09-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161795#comment-16161795
 ] 

Chris Douglas commented on YARN-6622:
-

bq. docker container is alpha feature which is generally known.
It's generally known to developers, but not to users. They're the target for 
this documentation. Unless they're familiar with both Docker and Hadoop, 
they're unlikely to understand the consequences of enabling this feature.

> Document Docker work as experimental
> 
>
> Key: YARN-6622
> URL: https://issues.apache.org/jira/browse/YARN-6622
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6622.001.patch
>
>
> We should update the Docker support documentation calling out the Docker work 
> as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149932#comment-16149932
 ] 

Chris Douglas commented on YARN-6721:
-

Cool, ship it. +1

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch, YARN-6721.01.patch, 
> YARN-6721.02.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149333#comment-16149333
 ] 

Chris Douglas commented on YARN-6721:
-

Bravo, figuring out what's is going on with clang. I looked for supporting 
documentation on OSX, and found mostly confusion.

+1

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6944) The comment about ResourceManager#createPolicyMonitors lies

2017-08-03 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113699#comment-16113699
 ] 

Chris Douglas commented on YARN-6944:
-

bq. Monitors don't handle preemption.

[They|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java]
 
[do|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java#L82].

> The comment about ResourceManager#createPolicyMonitors lies
> ---
>
> Key: YARN-6944
> URL: https://issues.apache.org/jira/browse/YARN-6944
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Yufei Gu
>Priority: Trivial
>
> {code} 
>  // creating monitors that handle preemption
>   createPolicyMonitors();
> {code} 
> Monitors don't handle preemption. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object

2017-08-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109976#comment-16109976
 ] 

Chris Douglas commented on YARN-6593:
-

bq. The only thing remaining in this jira is the example class for how to use 
the APIs - whether it's worth to do or not ?
Examples are essential, but can that be part of a followup JIRA? Particularly 
since the implementation(s) may affect the API.

> [API] Introduce Placement Constraint object
> ---
>
> Key: YARN-6593
> URL: https://issues.apache.org/jira/browse/YARN-6593
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-6593.001.patch, YARN-6593.002.patch, 
> YARN-6593.003.patch, YARN-6593.004.patch, YARN-6593.005.patch, 
> YARN-6593.006.patch, YARN-6593.007.patch, YARN-6593.008.patch
>
>
> Just removed Fixed version and moved it to target version as we set fix 
> version only after patch is committed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor

2017-07-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095227#comment-16095227
 ] 

Chris Douglas commented on YARN-6726:
-

Not sure if I'll have cycles to review the patch in detail, but quickly:

bq. No user input is used, so this should be safe.
We also need to prevent the {{yarn}} user from becoming root, so we can't trust 
input to the CE even if it's filled in by the NM during container launch

> Fix issues with docker commands executed by container-executor
> --
>
> Key: YARN-6726
> URL: https://issues.apache.org/jira/browse/YARN-6726
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-6726.001.patch
>
>
> docker inspect, rm, stop, etc are issued through container-executor. Commands 
> other than docker run are not functioning properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object

2017-07-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095217#comment-16095217
 ] 

Chris Douglas commented on YARN-6593:
-

bq. I found it is very important otherwise we cannot support complex placement 
request which need to be updated.
Do you have specific use cases in mind? If there's a restricted form we could 
use to support them (e.g., adjusting cardinality, as in your example), that 
would be easier for us to support and for users to reason about. Since we don't 
have applications that use placement constraints yet, it may be difficult for 
us to predict where they need to change during execution (if at all).

bq. Regarding to semantics, I prefer to apply to all containers placed 
subsequently, this is also the closest behavior of existing YARN. We just need 
to verify updated placement request is still valid, probably we don't need to 
restricted to some parameters.
I don't have a clear definition of validity across placement requests, 
particularly preserving it across a sequence of updates to the constraints. We 
could support relaxations of existing constraints, probably. Still, updates 
also require the LRA scheduler to maintain lineage for all its internal 
structures. A likely implementation will convert users' expressions to some 
normal form, combine those with admin constraints, forecast future allocations, 
inject requests into the scheduler, etc. Even if we could offer well-defined 
semantics for updates, the implementation and maintenance cost could outweigh 
the marginal benefit to users. If the workarounds (like submitting a new 
application or a new set of constraints) are easier to understand, that's 
probably what users will prefer, anyway.

Placement constraint updates also compound the {{ResourceRequest}} problem you 
cited in YARN-6594. Which epoch of the placement constraints applied to a 
container returned by the RM, and for which RR? If a user's application isn't 
getting containers, how is that debugged? If someone wants to reason about a 
group of constraints for a production cluster while applications change clauses 
programmatically at runtime, then that analysis goes from difficult to 
intractable.

You guys are implementing it, but I'd push this to future work.

> [API] Introduce Placement Constraint object
> ---
>
> Key: YARN-6593
> URL: https://issues.apache.org/jira/browse/YARN-6593
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6593.001.patch, YARN-6593.002.patch, 
> YARN-6593.003.patch, YARN-6593.004.patch
>
>
> This JIRA introduces an object for defining placement constraints.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6223) [Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN

2017-07-19 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094074#comment-16094074
 ] 

Chris Douglas commented on YARN-6223:
-

[~leftnoteasy], could you summarize the implementation a bit? What would an 
example cfg look like and how is it interpreted?

> [Umbrella] Natively support GPU configuration/discovery/scheduling/isolation 
> on YARN
> 
>
> Key: YARN-6223
> URL: https://issues.apache.org/jira/browse/YARN-6223
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6223.Natively-support-GPU-on-YARN-v1.pdf, 
> YARN-6223.wip.1.patch, YARN-6223.wip.2.patch, YARN-6223.wip.3.patch
>
>
> As varieties of workloads are moving to YARN, including machine learning / 
> deep learning which can speed up by leveraging GPU computation power. 
> Workloads should be able to request GPU from YARN as simple as CPU and memory.
> *To make a complete GPU story, we should support following pieces:*
> 1) GPU discovery/configuration: Admin can either config GPU resources and 
> architectures on each node, or more advanced, NodeManager can automatically 
> discover GPU resources and architectures and report to ResourceManager 
> 2) GPU scheduling: YARN scheduler should account GPU as a resource type just 
> like CPU and memory.
> 3) GPU isolation/monitoring: once launch a task with GPU resources, 
> NodeManager should properly isolate and monitor task's resource usage.
> For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced 
> an extensible framework to support isolation for different resource types and 
> different runtimes.
> *Related JIRAs:*
> There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but 
> different solutions:
> For scheduling:
> - YARN-4122/YARN-5517 are all adding a new GPU resource type to Resource 
> protocol instead of leveraging YARN-3926.
> For isolation:
> - And YARN-4122 proposed to use CGroups to do isolation which cannot solve 
> the problem listed at 
> https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation#challenges such as 
> minor device number mapping; load nvidia_uvm module; mismatch of CUDA/driver 
> versions, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object

2017-07-19 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094052#comment-16094052
 ] 

Chris Douglas commented on YARN-6593:
-

I like the {{T accept(Visitor visitor)}} pattern and composing expressions 
with {{PlacementConstraints}}; these are well polished. I agree with 
[~leftnoteasy] on {{PlacementConstraints}} being the primary {{\@Public}} API. 
Do we want to support users adding new transforms? If not, some of the 
implementation details could be package-private.

[~leftnoteasy]: what are the semantics of updated constraints? Do they apply to 
all containers placed subsequently, or could it cause a reconfiguration of 
allocated containers? Or are updates restricted to (some?) parameters of the 
expression? This isn't covered in the design doc on YARN-6592.

Minor:
* {{convert}} methods in {{PlacementConstraintFromProtoConverter}} should fail 
if composite constraints have no children? Or would these invariants be checked 
by a validator after construction?
* {{PlacementConstraints#timedMillisConstraint}} could accept a 
[TimeUnit|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/TimeUnit.html]
 and convert to ms
* In this expression:
{noformat}
+  if (constraint.getOp() == TargetOperator.IN) {
+newConstraint = new SingleConstraint(constraint.getScope(), 1,
+Integer.MAX_VALUE, constraint.getTargetExpressions());
+  } else {
{noformat}
Might operator types be extended in the future, where this is not correct?
* All the constraints derive from the inner, {{AbstractConstraint}} type. This 
avoids having {{PlacementConstraint}} accept a Visitor?
* A unit test demonstrating the PB serialization/deserialization would 
demonstrate the converter classes.

> [API] Introduce Placement Constraint object
> ---
>
> Key: YARN-6593
> URL: https://issues.apache.org/jira/browse/YARN-6593
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6593.001.patch, YARN-6593.002.patch, 
> YARN-6593.003.patch, YARN-6593.004.patch
>
>
> This JIRA introduces an object for defining placement constraints.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-650) User guide for preemption

2017-06-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved YARN-650.

Resolution: Won't Fix

Documentation was added in YARN-4492

> User guide for preemption
> -
>
> Key: YARN-650
> URL: https://issues.apache.org/jira/browse/YARN-650
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: Y650-0.patch
>
>
> YARN-45 added a protocol for the RM to ask back resources. The docs on 
> writing YARN applications should include a section on how to interpret this 
> message.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6698) Backport YARN-5121 to branch-2.7

2017-06-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043467#comment-16043467
 ] 

Chris Douglas commented on YARN-6698:
-

I just skimmed the backport and compared with YARN-5121, but lgtm. +1

> Backport YARN-5121 to branch-2.7
> 
>
> Key: YARN-6698
> URL: https://issues.apache.org/jira/browse/YARN-6698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Blocker
> Attachments: YARN-6698-branch-2.7-01.patch, 
> YARN-6698-branch-2.7-test.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2017-05-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023876#comment-16023876
 ] 

Chris Douglas commented on YARN-1471:
-

[~curino] is this contained in YARN-6608? If so, maybe we should look into 
backporting that, instead of individual SLS patches.

> The SLS simulator is not running the preemption policy for CapacityScheduler
> 
>
> Key: YARN-1471
> URL: https://issues.apache.org/jira/browse/YARN-1471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
>  Labels: release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, 
> YARN-1471-branch-2.7.4.patch, YARN-1471.patch, YARN-1471.patch
>
>
> The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
> This is because the policy needs to interact with a CapacityScheduler, and 
> the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6622) Document Docker work as experimental

2017-05-19 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017799#comment-16017799
 ] 

Chris Douglas commented on YARN-6622:
-

Including an explanation of the risks and/or pointers to 
[references|https://docs.docker.com/engine/security/security], would help users 
make an informed decision. Without that, they'll likely gloss over this 
disclaimer.

> Document Docker work as experimental
> 
>
> Key: YARN-6622
> URL: https://issues.apache.org/jira/browse/YARN-6622
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-6622.001.patch
>
>
> We should update the Docker support documentation calling out the Docker work 
> as experimental.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2017-05-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: (was: YARN-4476.005.patch)

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>  Labels: oct16-medium
> Attachments: YARN-4476.003.patch, YARN-4476.004.patch, 
> YARN-4476.005.patch, YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2017-05-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476.005.patch

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>  Labels: oct16-medium
> Attachments: YARN-4476.003.patch, YARN-4476.004.patch, 
> YARN-4476.005.patch, YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2017-05-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476.005.patch

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>  Labels: oct16-medium
> Attachments: YARN-4476.003.patch, YARN-4476.004.patch, 
> YARN-4476.005.patch, YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6577) Remove unused ContainerLocalization classes

2017-05-17 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-6577:

Summary: Remove unused ContainerLocalization classes  (was: Useless 
interface and implementation class)

> Remove unused ContainerLocalization classes
> ---
>
> Key: YARN-6577
> URL: https://issues.apache.org/jira/browse/YARN-6577
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3, 3.0.0-alpha2
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-6577.001.patch
>
>
> From 2.7.3  and 3.0.0-alpha2, the ContainerLocalization interface and the 
> ContainerLocalizationImpl implementation class are of no use, and I recommend 
> removing the useless interface and implementation classes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2017-05-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476.004.patch

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>  Labels: oct16-medium
> Attachments: YARN-4476.003.patch, YARN-4476.004.patch, 
> YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6451) Add RM monitor validating metrics invariants

2017-04-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-6451:

Issue Type: New Feature  (was: Bug)

> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6451) Add RM monitor validating metrics invariants

2017-04-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-6451:

Summary: Add RM monitor validating metrics invariants  (was: Create a 
monitor to check whether we maintain RM (scheduling) invariants)

> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants

2017-04-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971709#comment-15971709
 ] 

Chris Douglas commented on YARN-6451:
-

bq.  when invariants are violated the log line is harder to read if combined, 
but perf is much better. In the current example of invariants.txt I will leave 
this with one invariant per line, so slower but easier to understand---works?

This could evaluate the combined expression, and only if it detects some 
violation, iterate over the set of expressions to print specific error 
messages. Though shaving fractions of a millisecond off the validation check is 
probably not significant.

+1 overall. For future versions:
* The invariant checker might want to use bindings across contexts; this would 
be hard to express as subtypes of {{InvariantsChecker}}. For example, if one 
wanted to check some invariant using values from the scheduler and the metrics, 
there isn't a good way to compose the two with inheritance. That said, in the 
current RM it's hard to correlate values collected from multiple components 
without reasoning about their mutual consistency in a brittle, ad hoc way. How 
invariants are loaded and how errors are handled could also be abstracted, but 
(IMHO) that'd be premature. This is approachable as-is.
* The unit test is kind of light
* This could print a warning when it starts up, since it's mostly for testing. 
If it's accidentally deployed in a production setting, it should show up in the 
log. The RM refuses to start if {{invariants.txt}} is missing?

> Create a monitor to check whether we maintain RM (scheduling) invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants

2017-04-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966500#comment-15966500
 ] 

Chris Douglas commented on YARN-6451:
-

Cool, I hadn't seen the {{javax.script}} package before. Throwing a bespoke 
exception can also be configured to halt the JVM and call back to a debugger, 
which is a nice touch for the SLS case.
* The invariants can be precompiled, to avoid the parsing/compilation overhead 
for each iteration.
* If not invoking a debugger, then it'd be nice to know the bindings when the 
invariant doesn't hold.
* The invariant check could be part of the {{metrics2.MetricsCollector}}, 
particularly if it's possible to filter the metrics it gathers based on the 
configured invariants.

> Create a monitor to check whether we maintain RM (scheduling) invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6336) Jenkins report YARN new UI build failure

2017-03-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925452#comment-15925452
 ] 

Chris Douglas commented on YARN-6336:
-

Also HDFS-6984

> Jenkins report YARN new UI build failure 
> -
>
> Key: YARN-6336
> URL: https://issues.apache.org/jira/browse/YARN-6336
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Priority: Blocker
>
> In Jenkins report of YARN-6313 
> (https://builds.apache.org/job/PreCommit-YARN-Build/15260/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn.txt),
>  we found following build failure due to YARN new UI:
> {noformat}
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node_modules/ember-cli-htmlbars/node_modules/broccoli-persistent-filter/node_modules/async-disk-cache/node_modules/username/index.js:2
> const os = require('os');
> ^
> Use of const in strict mode.
> SyntaxError: Use of const in strict mode.
> at Module._compile (module.js:439:25)
> at Object.Module._extensions..js (module.js:474:10)
> at Module.load (module.js:356:32)
> at Function.Module._load (module.js:312:12)
> at Module.require (module.js:364:17)
> at require (module.js:380:17)
> at Object. 
> (/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node_modules/ember-cli-htmlbars/node_modules/broccoli-persistent-filter/node_modules/async-disk-cache/index.js:24:16)
> at Module._compile (module.js:456:26)
> at Object.Module._extensions..js (module.js:474:10)
> at Module.load (module.js:356:32)
> DEPRECATION: Node v0.10.25 is no longer supported by Ember CLI. Please update 
> to a more recent version of Node
> undefined
> version: 1.13.15
> Could not find watchman, falling back to NodeWatcher for file system events.
> Visit http://www.ember-cli.com/user-guide/#watchman for more info.
> Building[INFO] 
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6191) CapacityScheduler preemption by container priority can be problematic for MapReduce

2017-02-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876567#comment-15876567
 ] 

Chris Douglas commented on YARN-6191:
-

bq. However there's still an issue because the preemption message is too 
general. For example, if the message says "going to preempt 60GB of resources" 
and the AM kills 10 reducers that are 6GB each on 6 different nodes, the RM can 
still kill the maps because the RM needed 60GB of contiguous resources.

I haven't followed the modifications to the preemption policy, so I don't know 
if the AM will be selected as a victim again even after satisfying the contract 
(it should not). The preemption message should be expressive enough to encode 
this, if that's the current behavior. If the RM will only accept 60GB of 
resources from a single node, then that can be encoded in a ResourceRequest in 
the preemption message.

Even if everything behaves badly, killing the reducers is still correct, right? 
If the job is still entitled to resources, then it should reschedule the map 
tasks before the reducers. There are still interleavings of requests that could 
result in the same behavior described in this JIRA, but they'd be stunningly 
unlucky.

bq. I still wonder about the logic of preferring lower container priorities 
regardless of how long they've been running. I'm not sure container priority 
always translates well to how important a container is to the application, and 
we might be better served by preferring to minimize total lost work regardless 
of container priority.

All of the options [~sunilg] suggests are fine heuristics, but the application 
has the best view of the tradeoffs. For example, a long-running container might 
be amortizing the cost of scheduling short-lived tasks, and might actually be 
cheap to kill. If the preemption message is not accurately reporting the 
contract the RM is enforcing, then we should absolutely fix that. But I think 
this is a MapReduce problem, ultimately.

> CapacityScheduler preemption by container priority can be problematic for 
> MapReduce
> ---
>
> Key: YARN-6191
> URL: https://issues.apache.org/jira/browse/YARN-6191
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Jason Lowe
>
> A MapReduce job with thousands of reducers and just a couple of maps left to 
> go was running in a preemptable queue.  Periodically other queues would get 
> busy and the RM would preempt some resources from the job, but it _always_ 
> picked the job's map tasks first because they use the lowest priority 
> containers.  Even though the reducers had a shorter running time, most were 
> spared but the maps were always shot.  Since the map tasks ran for a longer 
> time than the preemption period, the job was in a perpetual preemption loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6191) CapacityScheduler preemption by container priority can be problematic for MapReduce

2017-02-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866914#comment-15866914
 ] 

Chris Douglas commented on YARN-6191:
-

This is related to a 
[discussion|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201702.mbox/%3CCACO5Y4wVm-9_3uES+qVvi2ypzsGTvu9jbEgVfTb79unPH-E=t...@mail.gmail.com%3E]
 on mapreduce-dev@ on the incomplete, work-conserving preemption logic. The MR 
AM should react by killing reducers when it gets a preemption message 
(checkpointing their state, if possible).

> CapacityScheduler preemption by container priority can be problematic for 
> MapReduce
> ---
>
> Key: YARN-6191
> URL: https://issues.apache.org/jira/browse/YARN-6191
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Jason Lowe
>
> A MapReduce job with thousands of reducers and just a couple of maps left to 
> go was running in a preemptable queue.  Periodically other queues would get 
> busy and the RM would preempt some resources from the job, but it _always_ 
> picked the job's map tasks first because they use the lowest priority 
> containers.  Even though the reducers had a shorter running time, most were 
> spared but the maps were always shot.  Since the map tasks ran for a longer 
> time than the preemption period, the job was in a perpetual preemption loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor

2016-12-07 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730846#comment-15730846
 ] 

Chris Douglas commented on YARN-5719:
-

Does someone have cycles to take a look at this?  [~vvasudev], [~aw], 
[~sidharta-s]?

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3460) TestSecureRMRegistryOperations fails with IBM_JAVA

2016-10-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3460:

Summary: TestSecureRMRegistryOperations fails with IBM_JAVA  (was: Test 
TestSecureRMRegistryOperations failed with IBM_JAVA JVM)

> TestSecureRMRegistryOperations fails with IBM_JAVA
> --
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
>Assignee: pascal oliva
> Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, 
> YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, 
> YARN-3460.005.patch, YARN-3460.006.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM

2016-10-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3460:

Assignee: pascal oliva

> Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
> 
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
>Assignee: pascal oliva
> Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, 
> YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, 
> YARN-3460.005.patch, YARN-3460.006.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2016-10-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476.003.patch

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>  Labels: oct16-medium
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch, 
> YARN-4476.003.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM

2016-10-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3460:

Attachment: YARN-3460.006.patch

> Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
> 
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
> Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, 
> YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, 
> YARN-3460.005.patch, YARN-3460.006.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM

2016-10-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3460:

Attachment: YARN-3460.005.patch

ASF license warnings are unrelated:
{noformat}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-1.9.4/js/jquery.dataTables.min.js
 !? 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/jquery-1.8.2.min.js
 !? 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/jquery-ui-1.9.1.custom.min.js
 !? 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jt/jquery.jstree.js
{noformat}

Fixed some checkstyle warnings.

> Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
> 
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
> Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, 
> YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, YARN-3460.005.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3460:

Attachment: YARN-3460.004.patch

This change invokes the "login" method, when "commit" is intended:
{noformat}
-boolean commitOk = krb5LoginModule.commit();
+Method methodCommit = kerb5LoginObject.getClass().getMethod("commit");
+boolean commitOk = (Boolean) methodLogin.invoke(kerb5LoginObject);
{noformat}

Updated patch. Can someone test this on an IBM jdk?

> Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
> 
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.6.0, 3.0.0-alpha1
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
> Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, 
> YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-2571:

Labels: oct16-hard  (was: )

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: oct16-hard
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, 
> YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch, 
> YARN-2571-016.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-2828:

Labels: oct16-easy  (was: BB2015-05-TBR)

> Enable auto refresh of web pages (using http parameter)
> ---
>
> Key: YARN-2828
> URL: https://issues.apache.org/jira/browse/YARN-2828
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tim Robertson
>Assignee: Vijay Bhat
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-2828.001.patch, YARN-2828.002.patch, 
> YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, 
> YARN-2828.006.patch
>
>
> The MR1 Job Tracker had a useful HTTP parameter of e.g. "=3" that 
> could be appended to URLs which enabled a page reload.  This was very useful 
> when developing mapreduce jobs, especially to watch counters changing.  This 
> is lost in the the Yarn interface.
> Could be implemented as a page element (e.g. drop down or so), but I'd 
> recommend that the page not be more cluttered, and simply bring back the 
> optional "refresh" HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3432:

Labels: oct16-easy  (was: )

> Cluster metrics have wrong Total Memory when there is reserved memory on CS
> ---
>
> Key: YARN-3432
> URL: https://issues.apache.org/jira/browse/YARN-3432
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Brahma Reddy Battula
>  Labels: oct16-easy
> Attachments: YARN-3432-002.patch, YARN-3432-003.patch, YARN-3432.patch
>
>
> I noticed that when reservations happen when using the Capacity Scheduler, 
> the UI and web services report the wrong total memory.
> For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
> and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
> This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
> there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3477:

Labels: oct16-easy  (was: )

> TimelineClientImpl swallows exceptions
> --
>
> Key: YARN-3477
> URL: https://issues.apache.org/jira/browse/YARN-3477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: oct16-easy
> Attachments: YARN-3477-001.patch, YARN-3477-002.patch, 
> YARN-3477-trunk.003.patch, YARN-3477-trunk.004.patch
>
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3514:

Labels: oct16-easy  (was: BB2015-05-TBR)

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3538) TimelineServer doesn't catch/translate all exceptions raised

2016-10-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-3538:

Labels: oct16-easy  (was: BB2015-05-TBR)

> TimelineServer doesn't catch/translate all exceptions raised
> 
>
> Key: YARN-3538
> URL: https://issues.apache.org/jira/browse/YARN-3538
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-3538-001.patch
>
>
> Not all exceptions in TimelineServer are uprated to web exceptions; only IOEs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5704) Provide config knobs to control enabling/disabling new/work in progress features in container-executor

2016-10-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576609#comment-15576609
 ] 

Chris Douglas commented on YARN-5704:
-

[~vvasudev] would you mind taking a look at YARN-5719 so we can enforce C99 (or 
whatever) for CE?

> Provide config knobs to control enabling/disabling new/work in progress 
> features in container-executor
> --
>
> Key: YARN-5704
> URL: https://issues.apache.org/jira/browse/YARN-5704
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5704-branch-2.8.001.patch, YARN-5704.001.patch
>
>
> Provide a mechanism to enable/disable Docker and TC (Traffic Control) 
> functionality at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor

2016-10-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573147#comment-15573147
 ] 

Chris Douglas commented on YARN-5719:
-

[~aw] would you mind taking a look?

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-5719) Enforce a C standard for native container-executor

2016-10-13 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-5719:

Comment: was deleted

(was: [~aw] would you mind taking a look?)

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor

2016-10-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573148#comment-15573148
 ] 

Chris Douglas commented on YARN-5719:
-

[~aw] would you mind taking a look?

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5719) Enforce a C standard for native container-executor

2016-10-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-5719:

Assignee: (was: Chris Douglas)

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5704) Provide config knobs to control enabling/disabling new/work in progress features in container-executor

2016-10-10 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563755#comment-15563755
 ] 

Chris Douglas commented on YARN-5704:
-

bq.  If we want to declare this code base as being C99 then we need to tell 
cmake to make sure we're using a C99 compiler. Until we do that, this code is 
defaulting to non-C99.

OK, got it. I don't suppose NoC99 has the same cachet as NoSQL? Let's pick a 
standard. Filed YARN-5719

bq. telling cmake that we're doing C99 is sort of a mine field, depending upon 
which version of cmake is in use.

Took a look at this and... yikes.

> Provide config knobs to control enabling/disabling new/work in progress 
> features in container-executor
> --
>
> Key: YARN-5704
> URL: https://issues.apache.org/jira/browse/YARN-5704
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-5704-branch-2.8.001.patch, YARN-5704.001.patch
>
>
> Provide a mechanism to enable/disable Docker and TC (Traffic Control) 
> functionality at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor

2016-10-10 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563749#comment-15563749
 ] 

Chris Douglas commented on YARN-5719:
-

There's a convenient 
[option|https://cmake.org/cmake/help/v3.1/prop_tgt/C_STANDARD.html] in recent 
versions of cmake to set the C standard in a portable way, but this is 
unavailable in the minimum version of cmake we require (2.6). v000 uses a set 
of switches based on a subset of [compiler 
ids|https://cmake.org/cmake/help/v3.0/variable/CMAKE_LANG_COMPILER_ID.html] 
we're likely(?) to support. The options themselves I pulled from cursory 
searches; I haven't tested with anything but gcc 4.8.4.

The LCE doesn't compile with ANSI C ({{-std=c89}}), but required almost no 
changes with C99. The only change with {{-pedantic-errors}} required some minor 
tweaks to {{get_user_info}}.

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5719) Enforce a C standard for native container-executor

2016-10-10 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-5719:

Attachment: YARN-5719.000.patch

> Enforce a C standard for native container-executor
> --
>
> Key: YARN-5719
> URL: https://issues.apache.org/jira/browse/YARN-5719
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-5719.000.patch
>
>
> The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5719) Enforce a C standard for native container-executor

2016-10-10 Thread Chris Douglas (JIRA)
Chris Douglas created YARN-5719:
---

 Summary: Enforce a C standard for native container-executor
 Key: YARN-5719
 URL: https://issues.apache.org/jira/browse/YARN-5719
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager
Reporter: Chris Douglas
Assignee: Chris Douglas


The {{container-executor}} build should declare the C standard it uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5704) Provide config knobs to control enabling/disabling new/work in progress features in container-executor

2016-10-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561309#comment-15561309
 ] 

Chris Douglas commented on YARN-5704:
-

Thanks for working on this, [~sidharta-s].

As part of the followup patch, please also avoid using {{strcat}} when printing 
the usage can be separated into multiple statements. Avoids allocating a buffer 
we need to track for overflow. Sorry, I hadn't noticed that earlier.

bq. If we take that to it's logical conclusion we just declare all of our 
utility functions as static and remove all the unit tests.

That takes this heuristic well past its logical conclusion, but it'll be 
addressed in YARN-5717.

bq. [Variable declaration in the middle] Just because the old code follows bad 
practices doesn't mean that new code should. c-e not being ANSI C compliant is 
a problem, BTW.

If this creates portability problems that makes sense, though VS is the only C 
compiler I know of that (until recently?) doesn't support most of C99. 
Initializing variables when they're declared can avoid accidents, particularly 
over long LCE methods. Are any platforms this could target restricted to ANSI C 
compilers?

Requiring that new patches use ANSI C, without making the rest of LCE 
compliant, adds a touchy manual step for committers and helps no users. If 
there are restrictions on the subset of C this should use, the compiler needs 
to enforce them.

> Provide config knobs to control enabling/disabling new/work in progress 
> features in container-executor
> --
>
> Key: YARN-5704
> URL: https://issues.apache.org/jira/browse/YARN-5704
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-5704-branch-2.8.001.patch, YARN-5704.001.patch
>
>
> Provide a mechanism to enable/disable Docker and TC (Traffic Control) 
> functionality at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5702) Refactor TestPBImplRecords so that we can reuse for testing protocol records in other YARN modules

2016-10-05 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-5702:

Fix Version/s: 2.9.0

> Refactor TestPBImplRecords so that we can reuse for testing protocol records 
> in other YARN modules
> --
>
> Key: YARN-5702
> URL: https://issues.apache.org/jira/browse/YARN-5702
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5702-v1.patch, YARN-5702-v2.patch
>
>
> The {{TestPBImplRecords}} has generic helper methods to validate YARN api 
> records. This JIRA proposes to refactor the generic helper methods into a 
> base class that can then be reused by other YARN modules for testing internal 
> API protocol records like in yarn-server-common for Federation (YARN-2915). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524603#comment-15524603
 ] 

Chris Douglas commented on YARN-5621:
-

That summary of work seems about right, thanks for putting it together.

You raise excellent points about error handling. Your sketch includes a channel 
communicating which resources were (un)successfully linked. The script-driven 
approach handles this in v05 by writing a separate bash script and invoking the 
CE for each symlink (which, to be fair, isn't exactly "lightweight" when 
compared to extending {{ContainerLocalizer}}). In v05, a failure affects only 
one resource, but to take your earlier example linking a batch of resources in 
the script: how would one handle partial failures? What's the state of the 
container and resources when the script invocation fails?

On the CL proposal: either the CI initiates the symlink request to the 
{{ResourceLocalizationService}} after download, or the two operations are 
contained within that service. The complexity is comparable. The 2-phase 
protocol you sketch (CI initiates download, then link) adds a gap when the CL 
could be shut down before it receives the {{LINK}} commands (causing two CL 
launches), but even a short timeout would likely cover that.

A single-message annotating the resource (download+symlink) could add states to 
{{LocalizedResource}} if it were to notify starting containers directly 
(current code) or handoff to the RLS for symlink. In this case, the protocol to 
the {{ContainerImpl}} is simpler (resending/retry is idempotent b/c it doesn't 
care if the download or symlink failed). Both {{FetchSuccessTransition}} and 
{{LocalizedResourceTransition}} would need to send 
{{LocalizerResourceRequestEvent}} for running containers to symlink. A failed 
symlink would look like a failed download to the CI. Start container is 
unaffected.

For the CL itself... sure, {{ResourceLocalizationSpec}} needs an another field 
for symlinks. This side is pretty straightforward, right?

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, 
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510866#comment-15510866
 ] 

Chris Douglas commented on YARN-5621:
-

bq. I think I understand your approach now, basically, [...]

Yes, that's the gist of it. The {{ContainerLocalizer}} manages the private 
cache as the user, with that user's cluster credentials. Running containers 
start a CL to download private resources and/or create symlinks.

bq. Is it starting both instances now? Not sure if I read the code wrong... It 
seems not the case. Based on the code, if it's an already existing resource, it 
will NOT start the ContainerLocalizer.

[container start] For different applications? It should. For container start, I 
don't remember offhand if the {{ContainerLocalizer}} spawn is delayed until at 
least one dependent resource is not claimed, but IIRC it starts if at least one 
resource is not downloaded. Either way, CLs could start in race for a resource 
_R_, and only one would (successfully) download it. Resources aren't claimed 
when the CL launches, only when it heartbeats in.

[CL proposal] For running container localization and for rollback, the CL will 
download the resource (again) and/or create the symlink to the running 
container. If multiple containers/applications request the same resource, it 
doesn't matter if it's a mix of new/running containers requesting a resource 
_R_. Only running/rollback containers will send symlink commands to their CL.

bq. This approach may not be easily worked for the new containers without 
structural change, when localizer is started, the work-dirs are not setup yet.

Again, container start is unaffected; new containers will not send {{LINK}} 
commands to the CL. Only _running_ containers will start a CL that receives 
{{LINK}} commands, after the work dirs are created and the container has 
started.

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, 
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507320#comment-15507320
 ] 

Chris Douglas commented on YARN-5621:
-

I think I see where the CL proposal was unclear.

It is an alternative to CE changes; container start remains as-is. The proposal 
was scoped only to localizing resources for running containers. The CE is 
agnostic to new/running containers for an application- it may be used by both, 
concurrently. By adding a new command {{LINK}} to its protocol, the NM can 
instruct the {{ContainerLocalizer}} to create a symlink to a resource for a 
running container. Again, these commands could be grouped.

{quote}
> a case that already exists for containers on the same node requesting the 
> same resource
Do you mean this is an existing implemented functionality or this is an 
existing use-case?
{quote}

Neither. The case where running containers (c ~1x~, c ~2y~) for different 
applications (a ~1~, a ~2~) request the same resource _R_ exists. Both will 
start {{ContainerLocalizer}} instances, but only one will download the resource 
to the private cache. In the CL proposal, this is the same as rollback, where 
the CL starts, heartbeats, then receives a command to LINK an existing resource 
without downloading anything. By "a case that already exists", I meant it's a 
case the CL proposal handles implicitly.

bq. yeah, I feel it's inefficient to start a localizer process to only create 
symlinks..

No question. But if localizing a new resource takes a few seconds, for services 
that upgrade over minutes/hours, then a few hundred milliseconds is not worth 
adding {{RUN_SCRIPT}} to the CE.

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, 
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-19 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504180#comment-15504180
 ] 

Chris Douglas commented on YARN-5621:
-

bq. this approach will not work in rollback scenario, as in that case no 
resources need to be localized - hence, no need to start the localizer 
processes. We only need to update the symlinks to old resources.

Sorry, I'm missing something. If the {{ContainerLocalizer}} supports a command 
to create symlinks to localized resources- a case that already exists for 
containers on the same node requesting the same resource- then how is that case 
distinguished from rollback? The container does need to start a 
{{ContainerLocalizer}} just to write some symlinks for the running container, 
which is inefficient. On the other hand, all symlinks for all containers from 
an application could be updated in the same invocation. When you say it does 
not work, are you noting the inefficiency of this flow, or is there a 
correctness problem?

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, 
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491452#comment-15491452
 ] 

Chris Douglas commented on YARN-5621:
-

bq. This may be a viable approach, we need to change the localizer heartbeat to 
send the symlink path.
The heartbeat already carries a payload with commands to the localizer. 
Including actions to symlink resources already fetched isn't that dire a change 
to either the ContainerLocalizer or the resource state machine, is it? The 
transition needs to send a LINK request to all localizers that were waiting in 
case the download failed.

bq. But if we want to create all symlinks in one go, this approach will not 
work.
This isn't going to be a transaction on the FS regardless, but can you explain 
this requirement? If symlink-on-download is disqualifying, then the container 
could still coordinate grouped symlinks by grouping LINK requests to a 
localizer. It rearranges the event flows awkwardly, but it's supportable...

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, 
> YARN-5621.4.patch, YARN-5621.5.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys

2016-09-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487715#comment-15487715
 ] 

Chris Douglas commented on YARN-5547:
-

bq. Skipping the container entirely would be very bad. The NM would not recover 
it, so it would then stop reporting it in heartbeats and the RM would then 
think it is dead/lost, but the container is actually still running, unmonitored 
and unkillable by YARN.

Agreed. What we were discussing was making container recovery independent, so 
containers using unknown features are not recovered, but failed and killed. The 
base case should recover nothing- all containers should be killed and cleaned 
up- but the NM should always start. I'm not sure every feature is neatly 
classified in the mandatory/optional taxonomy, particularly since many will 
depend on the version of the client and RM. It seems simpler (and safer) to 
always kill/clean up containers using features the NM doesn't understand.

> NMLeveldbStateStore should be more tolerant of unknown keys
> ---
>
> Key: YARN-5547
> URL: https://issues.apache.org/jira/browse/YARN-5547
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Ajith S
> Attachments: YARN-5547.01.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478543#comment-15478543
 ] 

Chris Douglas commented on YARN-5621:
-

bq. FWIW, I'd love to see us drop the container launch script. I haven't tried 
it, but I suspect we can do lots of fun things with the env vars.

For all containers, we have (1) NM constants (2) some user args we verify 
(e.g., container ID matches the token, is correctly formatted, etc.) used as 
args to the CE (which should validate that each of these args conforms to a 
schema). These are the args used to build paths. All other args (3) the user 
can control should be written to the container launch script, which is executed 
with the same permissions the container would have. The intent was to have all 
quoting games happen after we've switched to the user's context, and after 
we've discarded the NM environment. The implementation may have gaps, but is 
there a problem with the concept?

This JIRA follows a similar pattern, but without validation of args in the CE. 
If it were restricted s.t. the source had a fixed format in {{nmPrivate}} and 
the destination was derived from a formatted {{ContainerID}}, it could have 
comparable guarantees as the container start.

Unless the resource is public, could this avoid modifying the CE by moving the 
symlink to the {{ContainerLocalizer}}? It could receive a symlink command on a 
heartbeat, it's already running as the user, it may already be running to 
download the resource...

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources

2016-09-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477639#comment-15477639
 ] 

Chris Douglas commented on YARN-5621:
-

bq. Because the passed in symlink path is an absolute path

Yes, obviously. :) I'm asking why this is an absolute path, if (per the design 
doc) the symlink is still relative to the container's working dir.

bq. later on we need to create multiple symlinks in a single operation as done 
in current container_launch script, because if there is a large number of local 
Resources to be localized, we don't want to invoke the binary for each of them. 

Invoking the binary for each resource isn't so dire. Linking a group of 
resources only if they're all successfully localized could be useful for 
services/upgrades, though.

bq. I guess the question is why the original container_launch script is not 
done in this way?

I think Allen's point is that the TC/CE binaries have avoided abstraction and 
other conventional good taste to reduce the attack surface. If the CE can only 
run scripts that were written by the NM to a specific, restricted directory, it 
can only run them as the user in a destination following the NM schema, etc. 
that makes it harder to involve the CE in an attack. If the CE can invoke one 
stage without preconditions guaranteed by the previous stage, as 
{{--run-script}} may allow, that's substantively different from the existing 
behavior.

> Support LinuxContainerExecutor to create symlinks for continuously localized 
> resources
> --
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks

2016-09-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475198#comment-15475198
 ] 

Chris Douglas commented on YARN-5621:
-

Is the patch intended for another JIRA, or is the title too narrowly phrased? I 
haven't gone through the patch in detail, but a RUN_SCRIPT action is a very 
general mechanism for a specific function (LCE already supports symlink, 
right?).

Why relax this constraint?
{noformat}
-  if (dst.isAbsolute()) {
-throw new IOException("Destination must be relative");
-  }
{noformat}

> Support LinuxContainerExecutor to create symlinks
> -
>
> Key: YARN-5621
> URL: https://issues.apache.org/jira/browse/YARN-5621
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch
>
>
> When new resources are localized, new symlink needs to be created for the 
> localized resource. This is the change for the LinuxContainerExecutor to 
> create the symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5121) fix some container-executor portability issues

2016-07-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400276#comment-15400276
 ] 

Chris Douglas commented on YARN-5121:
-

+1 from me. Thanks, Allen for the patch and ChrisN for review.

bq. I did remove some other debugging code, but that one I thought was useful 
due to aggressive use of ternary operators
I haven't looked at the context, but if {{ret}} can never be null in that case 
({{real_fname}} is never null?), then the tenary operator is redundant. If it 
can be null, then the new debug stmt can cause a segfault before it prints? 
Nit-picking in any case.

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, 
> YARN-5121.06.patch, YARN-5121.07.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5164) Use plan RLE to improve CapacityOverTimePolicy efficiency

2016-07-25 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-5164:

Summary: Use plan RLE to improve CapacityOverTimePolicy efficiency  (was: 
CapacityOvertimePolicy does not take advantaged of plan RLE)

> Use plan RLE to improve CapacityOverTimePolicy efficiency
> -
>
> Key: YARN-5164
> URL: https://issues.apache.org/jira/browse/YARN-5164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5164-example.pdf, YARN-5164-inclusive.4.patch, 
> YARN-5164-inclusive.5.patch, YARN-5164.1.patch, YARN-5164.2.patch, 
> YARN-5164.5.patch, YARN-5164.6.patch, YARN-5164.7.patch, YARN-5164.8.patch
>
>
> As a consequence small time granularities (e.g., 1 sec) and long time horizon 
> for a reservation (e.g., months) run rather slow (10 sec). 
> Proposed resolution is to switch to interval math in checking, similar to how 
> YARN-4359 does for agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5121) fix some container-executor portability issues

2016-07-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383451#comment-15383451
 ] 

Chris Douglas commented on YARN-5121:
-

+1 overall, though I haven't tested it on multiple platforms. Thanks for also 
updating the L

Minor:
* Leftover debug stmt in {{configuration.c}}?
{noformat}
+fprintf(stderr, "fn=%s\n",file_name);
 strncpy(strrchr(buffer, '/') + 1, file_name, EXECUTOR_PATH_MAX);
 real_fname = buffer;
+fprintf(stderr, "real_fname=%s\n",real_fname);
{noformat}
* In {{container-executor.c}}, should "Error signalling process group %d with 
signal %d - %s\n" go to LOGFILE instead of stderr?
* -0 on the whitespace fixes... I'd prefer to keep the history, but the patch 
touches enough code that it may be worthwhile.

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5164) CapacityOvertimePolicy does not take advantaged of plan RLE

2016-07-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375971#comment-15375971
 ] 

Chris Douglas commented on YARN-5164:
-

Only minor nits, otherwise +1:
{{CapacityOverTimePolicy}}
- Avoid importing java.util.\*
- Where the intermediate points are added, the code would be more readable if 
the key were assigned to a named variable (instead of multiple calls to 
{{e.getKey()}}). Same with the point-wise integral computation
- checkstyle (spacing): {{+  if(e.getValue()!=null) {}}
- A comment briefly sketching the algorithm would help future maintainers

{{NoOverCommitPolicy}}
- The exception message should be reformatted (some redundant string concats) 
and omit references to the time it no longer reports
- Should the {{PlanningException}} be added as a cause, rather than 
concatenated with the ReservationID?

> CapacityOvertimePolicy does not take advantaged of plan RLE
> ---
>
> Key: YARN-5164
> URL: https://issues.apache.org/jira/browse/YARN-5164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5164-example.pdf, YARN-5164-inclusive.4.patch, 
> YARN-5164-inclusive.5.patch, YARN-5164.1.patch, YARN-5164.2.patch, 
> YARN-5164.5.patch, YARN-5164.6.patch
>
>
> As a consequence small time granularities (e.g., 1 sec) and long time horizon 
> for a reservation (e.g., months) run rather slow (10 sec). 
> Proposed resolution is to switch to interval math in checking, similar to how 
> YARN-4359 does for agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5132) Exclude generated protobuf sources from YARN Javadoc build

2016-05-25 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300822#comment-15300822
 ] 

Chris Douglas commented on YARN-5132:
-

>From the bug [~subru] cited, it looks like there's no solution for the javadoc 
>warnings in 2.5, a version we're unlikely to change and Google is unlikely to 
>fix.

[~aw], I think your point is that Jenkins should stop complaining about (new) 
javadoc warnings in generated code, rather than giving up generating javadoc 
entirely. The protobuf classes are public APIs, but they're not user-facing in 
our Java APIs... I'm pretty ambivalent about keeping javadoc for them; 
including it may mislead someone into writing against them, rather than the API 
classes. Since (IIRC) we exclude other \@Private APIs from the generated 
javadoc, this seems like a good change, overall. Unless there's a better way to 
effect it?

> Exclude generated protobuf sources from YARN Javadoc build
> --
>
> Key: YARN-5132
> URL: https://issues.apache.org/jira/browse/YARN-5132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Critical
> Attachments: YARN-5132-v1.patch
>
>
> Currently YARN build includes Javadoc from generated protobuf sources which 
> is causing CI to fail. This JIRA proposes to exclude generated protobuf 
> sources from YARN Javadoc build



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-03-15 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196751#comment-15196751
 ] 

Chris Douglas commented on YARN-2883:
-

Concerning v004:
* The use of {{getRemoteUgi}} in 
{{QueuingContainerManagerImpl::stopContainerInternalIfNotQueued}} may be 
unnecessary, neither will it work as expected. The check for user credentials 
will likely use the UGI from the {{EventDispatcher}}, not the RPC call that 
initiated it (in {{stopContainerInternalIfNotQueued}}). Setting the cause to 
{{KILLED_BY_APPMASTER}} may be inappropriate if queued containers could be 
killed for other reasons.
* If an application completes, its queued containers should be cleared.
* In {{getContainerStatusInternal}}, if the {{ConcurrentMap}} is necessary, 
then it should call {{get()}} once on the instance rather than 
{{containsKey()}}/{{get()}}
* Rather than adding null checks for a disabled queuing context, this could 
support a null context that effectively disables the queuing logic (as in 
{{NodeStatusUpdaterImpl}})
* It seems the queuing is not fair. New containers are started immediately, 
without checking if the queue is empty. However, if the queue contains any 
entries, they should have started from {{onStopMonitoringContainer}}. With a 
large container at the front of the queue, smaller, queued containers will not 
get a chance to run while new, small containers will.
* The queue should be bounded in some way.

Minor
* {{NMContext}} can set the queuing context as final, rather than a separate 
{{setQueuingContext}}, which is not threadsafe as written.
* I didn't look through the test code in detail, but the {{DeletionService}} 
sleeping for 10s seems odd
* New loggers should use slf4j, and the {{LOG.level("Text {}", arg)}} syntax 
rather than {{isLevelEnabled()}}
* The default case of {{QueuingContainerManagerImpl::handle}} should throw
* {{0.f}} is a valid literal?
* {{killOpportContainers}} may want to log a warning if killing opportunistic 
containers is insufficient to satisfy the contract (after the loop). This would 
be helpful when debugging.
* Do {{queuedGuarRequests}} and {{queuedOpportRequests}} need to be 
synchronized? Or is the handler sufficient?
* {{QueuingContainersMonitorImpl::AllocatedContainerInfo}} could define 
equals/hashcode and use {{Collection::remove}} instead of defining 
{{removeContainerFromQueue}}

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-trunk.004.patch, 
> YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, 
> YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-03-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173427#comment-15173427
 ] 

Chris Douglas commented on YARN-4734:
-

bq. For merge it at the top level, did you mean LICENSE.txt and BUILDING.txt? 
Are there any other files I need to change?

{{NOTICE.txt}} may also need to be updated. No worries on the WIP, we can do a 
pass on the docs when it's ready to merge.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173255#comment-15173255
 ] 

Chris Douglas commented on YARN-4734:
-

{{LICENSE.txt}} looks like it is based on, or copied from Apache Tez. Could you 
double-check the set of modules to ensure it's correct for Hadoop? We'll also 
need to merge it at the top level.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-16 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103350#comment-15103350
 ] 

Chris Douglas edited comment on YARN-4597 at 1/16/16 6:52 PM:
--

Thanks, Arun. Please feel free to take this over. It's only justified in 
context with these other changes.


was (Author: chris.douglas):
Thanks, Arun. Please feel free

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Douglas
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-16 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103350#comment-15103350
 ] 

Chris Douglas commented on YARN-4597:
-

Thanks, Arun. Please feel free

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Douglas
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-14 Thread Chris Douglas (JIRA)
Chris Douglas created YARN-4597:
---

 Summary: Add SCHEDULE to NM container lifecycle
 Key: YARN-4597
 URL: https://issues.apache.org/jira/browse/YARN-4597
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Douglas


Currently, the NM immediately launches containers after resource localization. 
Several features could be more cleanly implemented if the NM included a 
separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101149#comment-15101149
 ] 

Chris Douglas commented on YARN-4597:
-

The {{ContainerLaunchContext}} (CLC) specifies the prerequisites for starting a 
container on a node. These include setting up user/application directories and 
downloading dependencies to the NM cache (localization). The NM assumes that an 
authenticated {{startContainer}} request has not overbooked resources on the 
node, so resources are only reserved/enforced during the container launch and 
execution.

This JIRA proposes to add a phase between localization and container launch to 
manage a collection of runnable containers. Similar to the localizer stage, a 
container will launch only after all the resources from its CLC are assigned by 
a _local scheduler_. The local scheduler will select containers to run based on 
priority, declared requirements, and by monitoring utilization on the node 
(YARN-1011).

A few future and in-progress features motiviate this change.

*Preemption* Instead of sending a kill when the RM selects a victim container, 
it could instead convert it from a {{GUARANTEED}} to an {{OPTIMISTIC}} 
container (YARN-4335). This has two benefits. First, the downgraded container 
can continue to run until a guaranteed container arrives _and_ finishes 
localizing its dependencies, so the downgraded container has an opportunity to 
complete or checkpoint. When the guaranteed container moves from {{LOCALIZED}} 
to {{SCHEDULING}}, the local scheduler may select the victim (formerly 
guaranteed) container to be killed. \[1\] Second, the NM may elect to kill the 
victim container to run _different_ optimistic containers, particularly 
short-running tasks.

*Optimistic scheduling and overprovisioning* To support distributed scheduling 
(YARN-2877) and resource-aware scheduling (YARN-1011), the NM needs a component 
to select containers that are ready to run. The local scheduler can not only 
select tasks to run based on monitoring, it can also make offers to running 
containers using durations attached to leases \[2\]. Based on recent 
observations, it may start containers that oversubscribe the node, or delay 
starting containers if a lease is close to expiring (i.e., the container is 
likely to complete).

*Long-running services*. Note that by separating the local scheduler, both that 
module _and_ the localizer could be opened up as services provided by the NM. 
The localizer could also be extended to prioritize downloads among 
{{OPTIMISTIC}} containers (possibly preemptable by {{GUARANTEED}}, and to group 
containers based on their dependencies (e.g., avoid downloading a large dep for 
fewer than N optimistic containers). By exposing these services, the NM can 
assist with the following:

# Resource spikes. If a service container needs to spike temporarily, it may 
not need guaranteed resources (YARN-1197). Containers requiring low-latency 
elasticity could request optimistic resources instead of peak provisioning, 
resizing, or using workarounds like [Llama|http://cloudera.github.io/llama/]. 
If the local scheduler is addressable by local containers, then the lease could 
be logical (i.e., not start a process). Resources assigned to a {{RUNNING}} 
container could be published rather than triggering a launch. One could also 
imagine service workers marking some resources as unused, while retaining the 
authority to spike into them ("subleasing" them to opportunistic containers) by 
reclaiming them through the local scheduler.
# Upgrades. If the container needs to pull new dependencies, it could use the 
NM Localizer rather of coordinating the download itself.
# Maintenance tasks. Services often need to clean up, compact, scrub, and 
checkpoint local data. Right now, each service needs to independnetly monitor 
resource utilization to back off saturated resources (particularly disks). 
Coordination between services is difficult. In contrast, one could schedule 
tasks like block scrubbing as optimistic tasks in the NM to avoid interrupting 
services that are spiking. This is similar in spirit to distributed scheduling 
insofar as it does not involve the RM and targets a single host (i.e., the host 
the container is running on).

\[1\] Though it was selected as a victim by the RM, the local scheduler may 
decide to kill a different {{OPTIMISTIC}} container when the guaranteed 
container requests resources. For example, if a container completes on the node 
after the RM selected the victim, then the NM may elect to kill a smaller 
optimistic process if it is sufficient to satisfy the guarantee.
\[2\] Discussion on duration in YARN-1039 was part of a broader conversation on 
support for long-running services (YARN-896).


> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: 

[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2016-01-07 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476-2.patch

Fixed more checkstyle warnings. Diminishing returns on the remainder...

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065210#comment-15065210
 ] 

Chris Douglas edited comment on YARN-4476 at 12/19/15 3:40 AM:
---

bq. do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

I thought about it, but:
# It's a (potential) internal detail of the node label implementation, with 
other classes in the package
# The {{nodelabels}} package is sparse right now
# None of these classes are user-facing, so they're easy to move

So I put in in the {{nodelabels}} package, but don't have a strong opinion.


was (Author: chris.douglas):
bq. do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

I thought about it, but:
# It's a (potential) internal detail of the node label implementation, with 
other classes in the package
# The {{nodelabel}} package is sparse right now
# None of these classes are user-facing, so they're easy to move

So I put in in the {{nodelabels}} package, but don't have a strong opinion.

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065210#comment-15065210
 ] 

Chris Douglas commented on YARN-4476:
-

bq. do you think is it better to place this module to 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) 
for better organization?

I thought about it, but:
# It's a (potential) internal detail of the node label implementation, with 
other classes in the package
# The {{nodelabel}} package is sparse right now
# None of these classes are user-facing, so they're easy to move

So I put in in the {{nodelabels}} package, but don't have a strong opinion.

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2015-12-18 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476-1.patch

Add ASF license headers, fix findbugs warnings, address some of the checkstyle 
issues.

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch, YARN-4476-1.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4476) Matcher for complex node label expresions

2015-12-17 Thread Chris Douglas (JIRA)
Chris Douglas created YARN-4476:
---

 Summary: Matcher for complex node label expresions
 Key: YARN-4476
 URL: https://issues.apache.org/jira/browse/YARN-4476
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Chris Douglas
Assignee: Chris Douglas


Implementation of a matcher for complex node label expressions based on a 
[paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4476) Matcher for complex node label expresions

2015-12-17 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4476:

Attachment: YARN-4476-0.patch

> Matcher for complex node label expresions
> -
>
> Key: YARN-4476
> URL: https://issues.apache.org/jira/browse/YARN-4476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: YARN-4476-0.patch
>
>
> Implementation of a matcher for complex node label expressions based on a 
> [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063361#comment-15063361
 ] 

Chris Douglas commented on YARN-4195:
-

Posted a draft to YARN-4476

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063282#comment-15063282
 ] 

Chris Douglas commented on YARN-4195:
-

bq. a better version of this, which uses a cool algorithm which skips the 
conversion to DNF

The impl is based on a SIGMOD 2010 
[paper|http://dl.acm.org/citation.cfm?id=1807171] that converts boolean 
expressions to intervals. I'll adapt it for Hadoop and post a patch

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent

2015-12-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049364#comment-15049364
 ] 

Chris Douglas edited comment on YARN-4358 at 12/9/15 9:15 PM:
--

[~asuresh], you need not update the Javadoc of {{getReservationById}}. The 
problem is caused because we are specifying *Set* inside {{\{@ link\}}} so the 
fix should be just be to update the Javadoc of the return parameter of 
{{getReservations}} to:
{{@return set of active \{\@link ReservationAllocation\}s for the specified 
user at the requested time}}


was (Author: subru):
[~asuresh], you need not update the Javadoc of _getReservationById_. The 
problem is caused because we are specifying *Set* inside _{@ link}_ so the fix 
should be just be to update the Javadoc of the return parameter of 
_getReservations_ to:
bq @return set of active {@link ReservationAllocation}s for the specified user 
at the requested time

> Improve relationship between SharingPolicy and ReservationAgent
> ---
>
> Key: YARN-4358
> URL: https://issues.apache.org/jira/browse/YARN-4358
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, 
> YARN-4358.addendum.patch, YARN-4358.patch
>
>
> At the moment an agent places based on available resources, but has no 
> visibility to extra constraints imposed by the SharingPolicy. While not all 
> constraints are easily represented some (e.g., max-instantaneous resources) 
> are easily represented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-4248:

Attachment: YARN-4248-asflicense.patch

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047395#comment-15047395
 ] 

Chris Douglas commented on YARN-4248:
-

Pushed to trunk, branch-2, branch-2.8. Sorry to have missed these in review. 
Not sure why it wasn't flagged by test-patch.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047576#comment-15047576
 ] 

Chris Douglas commented on YARN-4248:
-

Thanks, Chris.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 2.8.0
>
> Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, 
> YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations

2015-12-07 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045438#comment-15045438
 ] 

Chris Douglas commented on YARN-4248:
-

+1 lgtm

If it's appropriate for this to go into 2.8, set the target version and post a 
notification on the release thread.

> REST API for submit/update/delete Reservations
> --
>
> Key: YARN-4248
> URL: https://issues.apache.org/jira/browse/YARN-4248
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4248.2.patch, YARN-4248.3.patch, YARN-4248.5.patch, 
> YARN-4248.6.patch, YARN-4248.patch
>
>
> This JIRA tracks work to extend the RMWebService to support REST APIs to 
> submit/update/delete reservations. This will ease integration with external 
> tools that are not java-based.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >