[jira] [Updated] (YARN-8275) Create a JNI interface to interact with Windows
[ https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-8275: Priority: Major (was: Minor) > Create a JNI interface to interact with Windows > --- > > Key: YARN-8275 > URL: https://issues.apache.org/jira/browse/YARN-8275 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: WinUtils.CSV > > > I did a quick investigation of the performance of WinUtils in YARN. In > average NM calls 4.76 times per second and 65.51 per container. > > | |Requests|Requests/sec|Requests/min|Requests/container| > |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*| > |[WinUtils] Execute -help|4148|0.145|8.769|2.007| > |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37| > |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43| > |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37| > |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05| > Interval: 7 hours, 53 minutes and 48 seconds > Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops. > This means *666.58* IO ops/second due to WinUtils. > We should start considering to remove WinUtils from Hadoop and creating a JNI > interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8216) Reduce RegistryDNS port ping logging
[ https://issues.apache.org/jira/browse/YARN-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved YARN-8216. - Resolution: Duplicate Assignee: (was: Eric Yang) Resolving as duplicate, per issue link. > Reduce RegistryDNS port ping logging > > > Key: YARN-8216 > URL: https://issues.apache.org/jira/browse/YARN-8216 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Eric Yang >Priority: Major > > System monitoring software usually send a tcp packet to test if port is > alive. This can cause RegistryDNS to throw BufferUnderflowException. > {code} > 2018-04-26 17:07:55,846 WARN > org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when > running task in RegistryDNS 3 > 2018-04-26 17:07:55,847 WARN > org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread > RegistryDNS 3: > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) > at > org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:771) > at > org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846) > at > org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > This is perfectly normal, but it would be nice to hide this error message to > reduce verbose logging on port ping. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-8216) Reduce RegistryDNS port ping logging
[ https://issues.apache.org/jira/browse/YARN-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas reopened YARN-8216: - > Reduce RegistryDNS port ping logging > > > Key: YARN-8216 > URL: https://issues.apache.org/jira/browse/YARN-8216 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > System monitoring software usually send a tcp packet to test if port is > alive. This can cause RegistryDNS to throw BufferUnderflowException. > {code} > 2018-04-26 17:07:55,846 WARN > org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when > running task in RegistryDNS 3 > 2018-04-26 17:07:55,847 WARN > org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread > RegistryDNS 3: > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) > at > org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:771) > at > org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846) > at > org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > This is perfectly normal, but it would be nice to hide this error message to > reduce verbose logging on port ping. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8216) Reduce RegistryDNS port ping logging
[ https://issues.apache.org/jira/browse/YARN-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16455593#comment-16455593 ] Chris Douglas commented on YARN-8216: - Why was this issue created and resolved as fixed, with no patch, within 3 minutes? > Reduce RegistryDNS port ping logging > > > Key: YARN-8216 > URL: https://issues.apache.org/jira/browse/YARN-8216 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > System monitoring software usually send a tcp packet to test if port is > alive. This can cause RegistryDNS to throw BufferUnderflowException. > {code} > 2018-04-26 17:07:55,846 WARN > org.apache.hadoop.util.concurrent.ExecutorHelper: Execution exception when > running task in RegistryDNS 3 > 2018-04-26 17:07:55,847 WARN > org.apache.hadoop.util.concurrent.ExecutorHelper: Caught exception in thread > RegistryDNS 3: > java.nio.BufferUnderflowException > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151) > at > org.apache.hadoop.registry.server.dns.RegistryDNS.nioTCPClient(RegistryDNS.java:771) > at > org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:846) > at > org.apache.hadoop.registry.server.dns.RegistryDNS$3.call(RegistryDNS.java:843) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > This is perfectly normal, but it would be nice to hide this error message to > reduce verbose logging on port ping. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449023#comment-16449023 ] Chris Douglas commented on YARN-8200: - bq. I would suggest to try use 3.x instead back porting this to 2.x so everybody is on the same codebase and improvement it. To me, the effort of backporting YARN-3926 + YARN-6223 will be comparable to upgrading a 3.x release and fixing (incompatible) issues >From [~jhung]'s analysis, the backports were relatively straightforward >(mostly new code). Keeping it in sync with fixes/improvements in 3.x will >require ongoing maintenance, which is unfortunate. Are there specific areas >where you suspect the backport could become difficult to maintain? > Backport resource types/GPU features to branch-2 > > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396339#comment-16396339 ] Chris Douglas commented on YARN-3409: - bq. Me and Sunil G tried to delete it but permissions were not there so were trying to get that done with Jian he and Others and in the mean while you helped us out. Delete of a branch could not be done by all ? I don't/shouldn't have any special privileges. Probably a change to the set of protected branches between when you tried and today. > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, client, RM >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Major > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3409) Support Node Attribute functionality
[ https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395767#comment-16395767 ] Chris Douglas commented on YARN-3409: - Deleted the {{yarn-3409}} branch, because it collides with {{YARN-3409}} on case-insensitive systems. The former looked like an accidental push. > Support Node Attribute functionality > > > Key: YARN-3409 > URL: https://issues.apache.org/jira/browse/YARN-3409 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, client, RM >Reporter: Wangda Tan >Assignee: Naganarasimha G R >Priority: Major > Attachments: 3409-apiChanges_v2.pdf (4).pdf, > Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch > > > Specify only one label for each node (IAW, partition a cluster) is a way to > determinate how resources of a special set of nodes could be shared by a > group of entities (like teams, departments, etc.). Partitions of a cluster > has following characteristics: > - Cluster divided to several disjoint sub clusters. > - ACL/priority can apply on partition (Only market team / marke team has > priority to use the partition). > - Percentage of capacities can apply on partition (Market team has 40% > minimum capacity and Dev team has 60% of minimum capacity of the partition). > Attributes are orthogonal to partition, they’re describing features of node’s > hardware/software just for affinity. Some example of attributes: > - glibc version > - JDK version > - Type of CPU (x86_64/i686) > - Type of OS (windows, linux, etc.) > With this, application can be able to ask for resource has (glibc.version >= > 2.20 && JDK.version >= 8u20 && x86_64). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml
[ https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352774#comment-16352774 ] Chris Douglas commented on YARN-6868: - Verified that the zookeeper and curator test jars aren't part of the package after backporting, pushed. > Add test scope to certain entries in hadoop-yarn-server-resourcemanager > pom.xml > --- > > Key: YARN-6868 > URL: https://issues.apache.org/jira/browse/YARN-6868 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 2.9.1 > > Attachments: YARN-6868.001.patch > > > The tag > {noformat} > test > {noformat} > is missing from a few entries in the pom.xml for > hadoop-yarn-server-resourcemanager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml
[ https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved YARN-6868. - Resolution: Fixed Fix Version/s: (was: 3.0.0-beta1) 2.9.1 > Add test scope to certain entries in hadoop-yarn-server-resourcemanager > pom.xml > --- > > Key: YARN-6868 > URL: https://issues.apache.org/jira/browse/YARN-6868 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 2.9.1 > > Attachments: YARN-6868.001.patch > > > The tag > {noformat} > test > {noformat} > is missing from a few entries in the pom.xml for > hadoop-yarn-server-resourcemanager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml
[ https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas reopened YARN-6868: - Sure. Reopening to cherry-pick this to branch-2 and branch-2.9 > Add test scope to certain entries in hadoop-yarn-server-resourcemanager > pom.xml > --- > > Key: YARN-6868 > URL: https://issues.apache.org/jira/browse/YARN-6868 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Major > Fix For: 3.0.0-beta1 > > Attachments: YARN-6868.001.patch > > > The tag > {noformat} > test > {noformat} > is missing from a few entries in the pom.xml for > hadoop-yarn-server-resourcemanager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files
[ https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325464#comment-16325464 ] Chris Douglas commented on YARN-7712: - bq. I need to use webhdfs to get the timestamp and set for each localized file every time I launch something. This is cumbersome and not necessary in case of my app Perhaps, but YARN doesn't have anything else for correctness. If you're convinced this is necessary, please ensure that the NM verifies that the timestamp for a cached dependency matches the remote, before it returns it to the client (so if it's changed, the app gets the new version, never the cached version). To be consistent, you may also want to add similar semantics for size. > Add ability to ignore timestamps in localized files > --- > > Key: YARN-7712 > URL: https://issues.apache.org/jira/browse/YARN-7712 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > > YARN currently requires and checks the timestamp of localized files and > fails, if the file on HDFS does not match to the one requested. This jira > adds the ability to ignore the timestamp based on the request of the client. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files
[ https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324620#comment-16324620 ] Chris Douglas commented on YARN-7712: - Got it. The purpose of the timestamp is not security, but correctness. It does not support applications that might specify a region of a dependency (e.g., download a segment of a log file being appended to) or on a dependency that does not exist during submission. It is sufficient for static dependencies (e.g., jars) that are uploaded prior to submission, and to avoid the NM linking a stale version of a resource for a new container. The only security guarantees come from the {{FileSystem}}. You mentioned the REST APIs a couple times. Why are those problematic? If this is purely for testing, one could use a {{FilterFileSystem}} that returns a constant for the modification time, rather than modifying YARN... > Add ability to ignore timestamps in localized files > --- > > Key: YARN-7712 > URL: https://issues.apache.org/jira/browse/YARN-7712 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > > YARN currently requires and checks the timestamp of localized files and > fails, if the file on HDFS does not match to the one requested. This jira > adds the ability to ignore the timestamp based on the request of the client. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files
[ https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319359#comment-16319359 ] Chris Douglas commented on YARN-7712: - bq. I would like to keep this jira simple, all it is about is to ignore the timestamp check on downloads Is the intent to accommodate a) modifications to files or b) completely different files- or files that don't exist during submission- as dependencies? What problem is this solving? Ignoring the timestamp makes localization non-deterministic. A reexecution of a task could download and use a different dependency. Speculatively executed tasks could use different dependencies, depending on which machine they run on. It's a rare user who can safely disable this check in YARN, but can't work around the timestamp check... > Add ability to ignore timestamps in localized files > --- > > Key: YARN-7712 > URL: https://issues.apache.org/jira/browse/YARN-7712 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > > YARN currently requires and checks the timestamp of localized files and > fails, if the file on HDFS does not match to the one requested. This jira > adds the ability to ignore the timestamp based on the request of the client. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files
[ https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317154#comment-16317154 ] Chris Douglas commented on YARN-7712: - As [~ste...@apache.org] [suggested|https://issues.apache.org/jira/browse/HDFS-7878?focusedCommentId=15512866=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15512866], we could also use the {{PathHandle}} API for YARN dependencies. > Add ability to ignore timestamps in localized files > --- > > Key: YARN-7712 > URL: https://issues.apache.org/jira/browse/YARN-7712 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > > YARN currently requires and checks the timestamp of localized files and > fails, if the file on HDFS does not match to the one requested. This jira > adds the ability to ignore the timestamp based on the request of the client. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7221) Add security check for privileged docker container
[ https://issues.apache.org/jira/browse/YARN-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173864#comment-16173864 ] Chris Douglas commented on YARN-7221: - Is this a duplicate of YARN-6623? Or is it an extension to permit privileged containers after passing additional security checks? > Add security check for privileged docker container > -- > > Key: YARN-7221 > URL: https://issues.apache.org/jira/browse/YARN-7221 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang > > When a docker is running with privileges, majority of the use case is to have > some program running with root then drop privileges to another user. i.e. > httpd to start with privileged and bind to port 80, then drop privileges to > www user. > # We should add security check for submitting users, to verify they have > "sudo" access to run privileged container. > # We should remove --user=uid:gid for privileged containers. > > Docker can be launched with --privileged=true, and --user=uid:gid flag. With > this parameter combinations, user will not have access to become root user. > All docker exec command will be drop to uid:gid user to run instead of > granting privileges. User can gain root privileges if container file system > contains files that give user extra power, but this type of image is > considered as dangerous. Non-privileged user can launch container with > special bits to acquire same level of root power. Hence, we lose control of > which image should be run with --privileges, and who have sudo rights to use > privileged container images. As the result, we should check for sudo access > then decide to parameterize --privileged=true OR --user=uid:gid. This will > avoid leading developer down the wrong path. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6622) Document Docker work as experimental
[ https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162173#comment-16162173 ] Chris Douglas commented on YARN-6622: - Sure, it's better than nothing. Thanks, [~templedf]. > Document Docker work as experimental > > > Key: YARN-6622 > URL: https://issues.apache.org/jira/browse/YARN-6622 > Project: Hadoop YARN > Issue Type: Task > Components: documentation >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Blocker > Attachments: YARN-6622.001.patch > > > We should update the Docker support documentation calling out the Docker work > as experimental. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6622) Document Docker work as experimental
[ https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162090#comment-16162090 ] Chris Douglas commented on YARN-6622: - Then let's backport YARN-5258. Enabling docker support in branch-2 effectively gives any user the capability to run processes as root on cluster machines. > Document Docker work as experimental > > > Key: YARN-6622 > URL: https://issues.apache.org/jira/browse/YARN-6622 > Project: Hadoop YARN > Issue Type: Task > Components: documentation >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Blocker > Attachments: YARN-6622.001.patch > > > We should update the Docker support documentation calling out the Docker work > as experimental. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6622) Document Docker work as experimental
[ https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161795#comment-16161795 ] Chris Douglas commented on YARN-6622: - bq. docker container is alpha feature which is generally known. It's generally known to developers, but not to users. They're the target for this documentation. Unless they're familiar with both Docker and Hadoop, they're unlikely to understand the consequences of enabling this feature. > Document Docker work as experimental > > > Key: YARN-6622 > URL: https://issues.apache.org/jira/browse/YARN-6622 > Project: Hadoop YARN > Issue Type: Task > Components: documentation >Reporter: Varun Vasudev >Assignee: Varun Vasudev >Priority: Blocker > Attachments: YARN-6622.001.patch > > > We should update the Docker support documentation calling out the Docker work > as experimental. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6721) container-executor should have stack checking
[ https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149932#comment-16149932 ] Chris Douglas commented on YARN-6721: - Cool, ship it. +1 > container-executor should have stack checking > - > > Key: YARN-6721 > URL: https://issues.apache.org/jira/browse/YARN-6721 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, security >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Critical > Labels: security > Attachments: YARN-6721.00.patch, YARN-6721.01.patch, > YARN-6721.02.patch > > > As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and > given that container-executor is setuid, it should be compiled with stack > checking if the compiler supports such features. (-fstack-check on gcc, > -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", > others as we find them, ...) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6721) container-executor should have stack checking
[ https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149333#comment-16149333 ] Chris Douglas commented on YARN-6721: - Bravo, figuring out what's is going on with clang. I looked for supporting documentation on OSX, and found mostly confusion. +1 > container-executor should have stack checking > - > > Key: YARN-6721 > URL: https://issues.apache.org/jira/browse/YARN-6721 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, security >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Critical > Labels: security > Attachments: YARN-6721.00.patch > > > As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and > given that container-executor is setuid, it should be compiled with stack > checking if the compiler supports such features. (-fstack-check on gcc, > -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", > others as we find them, ...) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6944) The comment about ResourceManager#createPolicyMonitors lies
[ https://issues.apache.org/jira/browse/YARN-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113699#comment-16113699 ] Chris Douglas commented on YARN-6944: - bq. Monitors don't handle preemption. [They|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java] [do|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java#L82]. > The comment about ResourceManager#createPolicyMonitors lies > --- > > Key: YARN-6944 > URL: https://issues.apache.org/jira/browse/YARN-6944 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.1, 3.0.0-alpha3 >Reporter: Yufei Gu >Priority: Trivial > > {code} > // creating monitors that handle preemption > createPolicyMonitors(); > {code} > Monitors don't handle preemption. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object
[ https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109976#comment-16109976 ] Chris Douglas commented on YARN-6593: - bq. The only thing remaining in this jira is the example class for how to use the APIs - whether it's worth to do or not ? Examples are essential, but can that be part of a followup JIRA? Particularly since the implementation(s) may affect the API. > [API] Introduce Placement Constraint object > --- > > Key: YARN-6593 > URL: https://issues.apache.org/jira/browse/YARN-6593 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-6593.001.patch, YARN-6593.002.patch, > YARN-6593.003.patch, YARN-6593.004.patch, YARN-6593.005.patch, > YARN-6593.006.patch, YARN-6593.007.patch, YARN-6593.008.patch > > > Just removed Fixed version and moved it to target version as we set fix > version only after patch is committed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor
[ https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095227#comment-16095227 ] Chris Douglas commented on YARN-6726: - Not sure if I'll have cycles to review the patch in detail, but quickly: bq. No user input is used, so this should be safe. We also need to prevent the {{yarn}} user from becoming root, so we can't trust input to the CE even if it's filled in by the NM during container launch > Fix issues with docker commands executed by container-executor > -- > > Key: YARN-6726 > URL: https://issues.apache.org/jira/browse/YARN-6726 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Shane Kumpf >Assignee: Shane Kumpf > Attachments: YARN-6726.001.patch > > > docker inspect, rm, stop, etc are issued through container-executor. Commands > other than docker run are not functioning properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object
[ https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16095217#comment-16095217 ] Chris Douglas commented on YARN-6593: - bq. I found it is very important otherwise we cannot support complex placement request which need to be updated. Do you have specific use cases in mind? If there's a restricted form we could use to support them (e.g., adjusting cardinality, as in your example), that would be easier for us to support and for users to reason about. Since we don't have applications that use placement constraints yet, it may be difficult for us to predict where they need to change during execution (if at all). bq. Regarding to semantics, I prefer to apply to all containers placed subsequently, this is also the closest behavior of existing YARN. We just need to verify updated placement request is still valid, probably we don't need to restricted to some parameters. I don't have a clear definition of validity across placement requests, particularly preserving it across a sequence of updates to the constraints. We could support relaxations of existing constraints, probably. Still, updates also require the LRA scheduler to maintain lineage for all its internal structures. A likely implementation will convert users' expressions to some normal form, combine those with admin constraints, forecast future allocations, inject requests into the scheduler, etc. Even if we could offer well-defined semantics for updates, the implementation and maintenance cost could outweigh the marginal benefit to users. If the workarounds (like submitting a new application or a new set of constraints) are easier to understand, that's probably what users will prefer, anyway. Placement constraint updates also compound the {{ResourceRequest}} problem you cited in YARN-6594. Which epoch of the placement constraints applied to a container returned by the RM, and for which RR? If a user's application isn't getting containers, how is that debugged? If someone wants to reason about a group of constraints for a production cluster while applications change clauses programmatically at runtime, then that analysis goes from difficult to intractable. You guys are implementing it, but I'd push this to future work. > [API] Introduce Placement Constraint object > --- > > Key: YARN-6593 > URL: https://issues.apache.org/jira/browse/YARN-6593 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6593.001.patch, YARN-6593.002.patch, > YARN-6593.003.patch, YARN-6593.004.patch > > > This JIRA introduces an object for defining placement constraints. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6223) [Umbrella] Natively support GPU configuration/discovery/scheduling/isolation on YARN
[ https://issues.apache.org/jira/browse/YARN-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094074#comment-16094074 ] Chris Douglas commented on YARN-6223: - [~leftnoteasy], could you summarize the implementation a bit? What would an example cfg look like and how is it interpreted? > [Umbrella] Natively support GPU configuration/discovery/scheduling/isolation > on YARN > > > Key: YARN-6223 > URL: https://issues.apache.org/jira/browse/YARN-6223 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6223.Natively-support-GPU-on-YARN-v1.pdf, > YARN-6223.wip.1.patch, YARN-6223.wip.2.patch, YARN-6223.wip.3.patch > > > As varieties of workloads are moving to YARN, including machine learning / > deep learning which can speed up by leveraging GPU computation power. > Workloads should be able to request GPU from YARN as simple as CPU and memory. > *To make a complete GPU story, we should support following pieces:* > 1) GPU discovery/configuration: Admin can either config GPU resources and > architectures on each node, or more advanced, NodeManager can automatically > discover GPU resources and architectures and report to ResourceManager > 2) GPU scheduling: YARN scheduler should account GPU as a resource type just > like CPU and memory. > 3) GPU isolation/monitoring: once launch a task with GPU resources, > NodeManager should properly isolate and monitor task's resource usage. > For #2, YARN-3926 can support it natively. For #3, YARN-3611 has introduced > an extensible framework to support isolation for different resource types and > different runtimes. > *Related JIRAs:* > There're a couple of JIRAs (YARN-4122/YARN-5517) filed with similar goals but > different solutions: > For scheduling: > - YARN-4122/YARN-5517 are all adding a new GPU resource type to Resource > protocol instead of leveraging YARN-3926. > For isolation: > - And YARN-4122 proposed to use CGroups to do isolation which cannot solve > the problem listed at > https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation#challenges such as > minor device number mapping; load nvidia_uvm module; mismatch of CUDA/driver > versions, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6593) [API] Introduce Placement Constraint object
[ https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094052#comment-16094052 ] Chris Douglas commented on YARN-6593: - I like the {{T accept(Visitor visitor)}} pattern and composing expressions with {{PlacementConstraints}}; these are well polished. I agree with [~leftnoteasy] on {{PlacementConstraints}} being the primary {{\@Public}} API. Do we want to support users adding new transforms? If not, some of the implementation details could be package-private. [~leftnoteasy]: what are the semantics of updated constraints? Do they apply to all containers placed subsequently, or could it cause a reconfiguration of allocated containers? Or are updates restricted to (some?) parameters of the expression? This isn't covered in the design doc on YARN-6592. Minor: * {{convert}} methods in {{PlacementConstraintFromProtoConverter}} should fail if composite constraints have no children? Or would these invariants be checked by a validator after construction? * {{PlacementConstraints#timedMillisConstraint}} could accept a [TimeUnit|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/TimeUnit.html] and convert to ms * In this expression: {noformat} + if (constraint.getOp() == TargetOperator.IN) { +newConstraint = new SingleConstraint(constraint.getScope(), 1, +Integer.MAX_VALUE, constraint.getTargetExpressions()); + } else { {noformat} Might operator types be extended in the future, where this is not correct? * All the constraints derive from the inner, {{AbstractConstraint}} type. This avoids having {{PlacementConstraint}} accept a Visitor? * A unit test demonstrating the PB serialization/deserialization would demonstrate the converter classes. > [API] Introduce Placement Constraint object > --- > > Key: YARN-6593 > URL: https://issues.apache.org/jira/browse/YARN-6593 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6593.001.patch, YARN-6593.002.patch, > YARN-6593.003.patch, YARN-6593.004.patch > > > This JIRA introduces an object for defining placement constraints. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved YARN-650. Resolution: Won't Fix Documentation was added in YARN-4492 > User guide for preemption > - > > Key: YARN-650 > URL: https://issues.apache.org/jira/browse/YARN-650 > Project: Hadoop YARN > Issue Type: Sub-task > Components: documentation >Reporter: Chris Douglas >Priority: Minor > Attachments: Y650-0.patch > > > YARN-45 added a protocol for the RM to ask back resources. The docs on > writing YARN applications should include a section on how to interpret this > message. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6698) Backport YARN-5121 to branch-2.7
[ https://issues.apache.org/jira/browse/YARN-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043467#comment-16043467 ] Chris Douglas commented on YARN-6698: - I just skimmed the backport and compared with YARN-5121, but lgtm. +1 > Backport YARN-5121 to branch-2.7 > > > Key: YARN-6698 > URL: https://issues.apache.org/jira/browse/YARN-6698 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Blocker > Attachments: YARN-6698-branch-2.7-01.patch, > YARN-6698-branch-2.7-test.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023876#comment-16023876 ] Chris Douglas commented on YARN-1471: - [~curino] is this contained in YARN-6608? If so, maybe we should look into backporting that, instead of individual SLS patches. > The SLS simulator is not running the preemption policy for CapacityScheduler > > > Key: YARN-1471 > URL: https://issues.apache.org/jira/browse/YARN-1471 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Minor > Labels: release-blocker > Fix For: 3.0.0-alpha1 > > Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, > YARN-1471-branch-2.7.4.patch, YARN-1471.patch, YARN-1471.patch > > > The simulator does not run the ProportionalCapacityPreemptionPolicy monitor. > This is because the policy needs to interact with a CapacityScheduler, and > the wrapping done by the simulator breaks this. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6622) Document Docker work as experimental
[ https://issues.apache.org/jira/browse/YARN-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16017799#comment-16017799 ] Chris Douglas commented on YARN-6622: - Including an explanation of the risks and/or pointers to [references|https://docs.docker.com/engine/security/security], would help users make an informed decision. Without that, they'll likely gloss over this disclaimer. > Document Docker work as experimental > > > Key: YARN-6622 > URL: https://issues.apache.org/jira/browse/YARN-6622 > Project: Hadoop YARN > Issue Type: Task > Components: documentation >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-6622.001.patch > > > We should update the Docker support documentation calling out the Docker work > as experimental. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: (was: YARN-4476.005.patch) > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Labels: oct16-medium > Attachments: YARN-4476.003.patch, YARN-4476.004.patch, > YARN-4476.005.patch, YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476.005.patch > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Labels: oct16-medium > Attachments: YARN-4476.003.patch, YARN-4476.004.patch, > YARN-4476.005.patch, YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476.005.patch > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Labels: oct16-medium > Attachments: YARN-4476.003.patch, YARN-4476.004.patch, > YARN-4476.005.patch, YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6577) Remove unused ContainerLocalization classes
[ https://issues.apache.org/jira/browse/YARN-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-6577: Summary: Remove unused ContainerLocalization classes (was: Useless interface and implementation class) > Remove unused ContainerLocalization classes > --- > > Key: YARN-6577 > URL: https://issues.apache.org/jira/browse/YARN-6577 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Fix For: 3.0.0-alpha2 > > Attachments: YARN-6577.001.patch > > > From 2.7.3 and 3.0.0-alpha2, the ContainerLocalization interface and the > ContainerLocalizationImpl implementation class are of no use, and I recommend > removing the useless interface and implementation classes -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476.004.patch > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Labels: oct16-medium > Attachments: YARN-4476.003.patch, YARN-4476.004.patch, > YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-6451: Issue Type: New Feature (was: Bug) > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-6451: Summary: Add RM monitor validating metrics invariants (was: Create a monitor to check whether we maintain RM (scheduling) invariants) > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971709#comment-15971709 ] Chris Douglas commented on YARN-6451: - bq. when invariants are violated the log line is harder to read if combined, but perf is much better. In the current example of invariants.txt I will leave this with one invariant per line, so slower but easier to understand---works? This could evaluate the combined expression, and only if it detects some violation, iterate over the set of expressions to print specific error messages. Though shaving fractions of a millisecond off the validation check is probably not significant. +1 overall. For future versions: * The invariant checker might want to use bindings across contexts; this would be hard to express as subtypes of {{InvariantsChecker}}. For example, if one wanted to check some invariant using values from the scheduler and the metrics, there isn't a good way to compose the two with inheritance. That said, in the current RM it's hard to correlate values collected from multiple components without reasoning about their mutual consistency in a brittle, ad hoc way. How invariants are loaded and how errors are handled could also be abstracted, but (IMHO) that'd be premature. This is approachable as-is. * The unit test is kind of light * This could print a warning when it starts up, since it's mostly for testing. If it's accidentally deployed in a production setting, it should show up in the log. The RM refuses to start if {{invariants.txt}} is missing? > Create a monitor to check whether we maintain RM (scheduling) invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966500#comment-15966500 ] Chris Douglas commented on YARN-6451: - Cool, I hadn't seen the {{javax.script}} package before. Throwing a bespoke exception can also be configured to halt the JVM and call back to a debugger, which is a nice touch for the SLS case. * The invariants can be precompiled, to avoid the parsing/compilation overhead for each iteration. * If not invoking a debugger, then it'd be nice to know the bindings when the invariant doesn't hold. * The invariant check could be part of the {{metrics2.MetricsCollector}}, particularly if it's possible to filter the metrics it gathers based on the configured invariants. > Create a monitor to check whether we maintain RM (scheduling) invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6336) Jenkins report YARN new UI build failure
[ https://issues.apache.org/jira/browse/YARN-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925452#comment-15925452 ] Chris Douglas commented on YARN-6336: - Also HDFS-6984 > Jenkins report YARN new UI build failure > - > > Key: YARN-6336 > URL: https://issues.apache.org/jira/browse/YARN-6336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Priority: Blocker > > In Jenkins report of YARN-6313 > (https://builds.apache.org/job/PreCommit-YARN-Build/15260/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn.txt), > we found following build failure due to YARN new UI: > {noformat} > /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node_modules/ember-cli-htmlbars/node_modules/broccoli-persistent-filter/node_modules/async-disk-cache/node_modules/username/index.js:2 > const os = require('os'); > ^ > Use of const in strict mode. > SyntaxError: Use of const in strict mode. > at Module._compile (module.js:439:25) > at Object.Module._extensions..js (module.js:474:10) > at Module.load (module.js:356:32) > at Function.Module._load (module.js:312:12) > at Module.require (module.js:364:17) > at require (module.js:380:17) > at Object. > (/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node_modules/ember-cli-htmlbars/node_modules/broccoli-persistent-filter/node_modules/async-disk-cache/index.js:24:16) > at Module._compile (module.js:456:26) > at Object.Module._extensions..js (module.js:474:10) > at Module.load (module.js:356:32) > DEPRECATION: Node v0.10.25 is no longer supported by Ember CLI. Please update > to a more recent version of Node > undefined > version: 1.13.15 > Could not find watchman, falling back to NodeWatcher for file system events. > Visit http://www.ember-cli.com/user-guide/#watchman for more info. > Building[INFO] > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6191) CapacityScheduler preemption by container priority can be problematic for MapReduce
[ https://issues.apache.org/jira/browse/YARN-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876567#comment-15876567 ] Chris Douglas commented on YARN-6191: - bq. However there's still an issue because the preemption message is too general. For example, if the message says "going to preempt 60GB of resources" and the AM kills 10 reducers that are 6GB each on 6 different nodes, the RM can still kill the maps because the RM needed 60GB of contiguous resources. I haven't followed the modifications to the preemption policy, so I don't know if the AM will be selected as a victim again even after satisfying the contract (it should not). The preemption message should be expressive enough to encode this, if that's the current behavior. If the RM will only accept 60GB of resources from a single node, then that can be encoded in a ResourceRequest in the preemption message. Even if everything behaves badly, killing the reducers is still correct, right? If the job is still entitled to resources, then it should reschedule the map tasks before the reducers. There are still interleavings of requests that could result in the same behavior described in this JIRA, but they'd be stunningly unlucky. bq. I still wonder about the logic of preferring lower container priorities regardless of how long they've been running. I'm not sure container priority always translates well to how important a container is to the application, and we might be better served by preferring to minimize total lost work regardless of container priority. All of the options [~sunilg] suggests are fine heuristics, but the application has the best view of the tradeoffs. For example, a long-running container might be amortizing the cost of scheduling short-lived tasks, and might actually be cheap to kill. If the preemption message is not accurately reporting the contract the RM is enforcing, then we should absolutely fix that. But I think this is a MapReduce problem, ultimately. > CapacityScheduler preemption by container priority can be problematic for > MapReduce > --- > > Key: YARN-6191 > URL: https://issues.apache.org/jira/browse/YARN-6191 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Jason Lowe > > A MapReduce job with thousands of reducers and just a couple of maps left to > go was running in a preemptable queue. Periodically other queues would get > busy and the RM would preempt some resources from the job, but it _always_ > picked the job's map tasks first because they use the lowest priority > containers. Even though the reducers had a shorter running time, most were > spared but the maps were always shot. Since the map tasks ran for a longer > time than the preemption period, the job was in a perpetual preemption loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6191) CapacityScheduler preemption by container priority can be problematic for MapReduce
[ https://issues.apache.org/jira/browse/YARN-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866914#comment-15866914 ] Chris Douglas commented on YARN-6191: - This is related to a [discussion|http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201702.mbox/%3CCACO5Y4wVm-9_3uES+qVvi2ypzsGTvu9jbEgVfTb79unPH-E=t...@mail.gmail.com%3E] on mapreduce-dev@ on the incomplete, work-conserving preemption logic. The MR AM should react by killing reducers when it gets a preemption message (checkpointing their state, if possible). > CapacityScheduler preemption by container priority can be problematic for > MapReduce > --- > > Key: YARN-6191 > URL: https://issues.apache.org/jira/browse/YARN-6191 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Reporter: Jason Lowe > > A MapReduce job with thousands of reducers and just a couple of maps left to > go was running in a preemptable queue. Periodically other queues would get > busy and the RM would preempt some resources from the job, but it _always_ > picked the job's map tasks first because they use the lowest priority > containers. Even though the reducers had a shorter running time, most were > spared but the maps were always shot. Since the map tasks ran for a longer > time than the preemption period, the job was in a perpetual preemption loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730846#comment-15730846 ] Chris Douglas commented on YARN-5719: - Does someone have cycles to take a look at this? [~vvasudev], [~aw], [~sidharta-s]? > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3460) TestSecureRMRegistryOperations fails with IBM_JAVA
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3460: Summary: TestSecureRMRegistryOperations fails with IBM_JAVA (was: Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM) > TestSecureRMRegistryOperations fails with IBM_JAVA > -- > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.6.0, 3.0.0-alpha1 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva >Assignee: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, > YARN-3460.005.patch, YARN-3460.006.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3460: Assignee: pascal oliva > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.6.0, 3.0.0-alpha1 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva >Assignee: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, > YARN-3460.005.patch, YARN-3460.006.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476.003.patch > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Labels: oct16-medium > Attachments: YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch, > YARN-4476.003.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3460: Attachment: YARN-3460.006.patch > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.6.0, 3.0.0-alpha1 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, > YARN-3460.005.patch, YARN-3460.006.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3460: Attachment: YARN-3460.005.patch ASF license warnings are unrelated: {noformat} Lines that start with ? in the ASF License report indicate files that do not have an Apache license header: !? /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-1.9.4/js/jquery.dataTables.min.js !? /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/jquery-1.8.2.min.js !? /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery/jquery-ui-1.9.1.custom.min.js !? /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jt/jquery.jstree.js {noformat} Fixed some checkstyle warnings. > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.6.0, 3.0.0-alpha1 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch, YARN-3460.005.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3460: Attachment: YARN-3460.004.patch This change invokes the "login" method, when "commit" is intended: {noformat} -boolean commitOk = krb5LoginModule.commit(); +Method methodCommit = kerb5LoginObject.getClass().getMethod("commit"); +boolean commitOk = (Boolean) methodLogin.invoke(kerb5LoginObject); {noformat} Updated patch. Can someone test this on an IBM jdk? > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.6.0, 3.0.0-alpha1 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch, YARN-3460-1.patch, > YARN-3460-2.patch, YARN-3460-3.patch, YARN-3460.004.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2571: Labels: oct16-hard (was: ) > RM to support YARN registry > > > Key: YARN-2571 > URL: https://issues.apache.org/jira/browse/YARN-2571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Labels: oct16-hard > Attachments: YARN-2571-001.patch, YARN-2571-002.patch, > YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, > YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, > YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch, > YARN-2571-016.patch > > > The RM needs to (optionally) integrate with the YARN registry: > # startup: create the /services and /users paths with system ACLs (yarn, hdfs > principals) > # app-launch: create the user directory /users/$username with the relevant > permissions (CRD) for them to create subnodes. > # attempt, container, app completion: remove service records with the > matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-2828) Enable auto refresh of web pages (using http parameter)
[ https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-2828: Labels: oct16-easy (was: BB2015-05-TBR) > Enable auto refresh of web pages (using http parameter) > --- > > Key: YARN-2828 > URL: https://issues.apache.org/jira/browse/YARN-2828 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tim Robertson >Assignee: Vijay Bhat >Priority: Minor > Labels: oct16-easy > Attachments: YARN-2828.001.patch, YARN-2828.002.patch, > YARN-2828.003.patch, YARN-2828.004.patch, YARN-2828.005.patch, > YARN-2828.006.patch > > > The MR1 Job Tracker had a useful HTTP parameter of e.g. "=3" that > could be appended to URLs which enabled a page reload. This was very useful > when developing mapreduce jobs, especially to watch counters changing. This > is lost in the the Yarn interface. > Could be implemented as a page element (e.g. drop down or so), but I'd > recommend that the page not be more cluttered, and simply bring back the > optional "refresh" HTTP param. It worked really nicely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS
[ https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3432: Labels: oct16-easy (was: ) > Cluster metrics have wrong Total Memory when there is reserved memory on CS > --- > > Key: YARN-3432 > URL: https://issues.apache.org/jira/browse/YARN-3432 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Brahma Reddy Battula > Labels: oct16-easy > Attachments: YARN-3432-002.patch, YARN-3432-003.patch, YARN-3432.patch > > > I noticed that when reservations happen when using the Capacity Scheduler, > the UI and web services report the wrong total memory. > For example. I have a 300GB of total memory in my cluster. I allocate 50 > and I reserve 10. The cluster metrics for total memory get reported as 290GB. > This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps > there is a difference between fair scheduler and capacity scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3477: Labels: oct16-easy (was: ) > TimelineClientImpl swallows exceptions > -- > > Key: YARN-3477 > URL: https://issues.apache.org/jira/browse/YARN-3477 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0, 2.7.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Labels: oct16-easy > Attachments: YARN-3477-001.patch, YARN-3477-002.patch, > YARN-3477-trunk.003.patch, YARN-3477-trunk.004.patch > > > If timeline client fails more than the retry count, the original exception is > not thrown. Instead some runtime exception is raised saying "retries run out" > # the failing exception should be rethrown, ideally via > NetUtils.wrapException to include URL of the failing endpoing > # Otherwise, the raised RTE should (a) state that URL and (b) set the > original fault as the inner cause -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures
[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3514: Labels: oct16-easy (was: BB2015-05-TBR) > Active directory usernames like domain\login cause YARN failures > > > Key: YARN-3514 > URL: https://issues.apache.org/jira/browse/YARN-3514 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.2.0 > Environment: CentOS6 >Reporter: john lilley >Priority: Minor > Labels: oct16-easy > Attachments: YARN-3514.001.patch, YARN-3514.002.patch > > > We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is > Kerberos-enabled and uses an external AD domain controller for the KDC. We > are able to authenticate, browse HDFS, etc. However, YARN fails during > localization because it seems to get confused by the presence of a \ > character in the local user name. > Our AD authentication on the nodes goes through sssd and set configured to > map AD users onto the form domain\username. For example, our test user has a > Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user > "domain\hadoopuser". We have no problem validating that user with PAM, > logging in as that user, su-ing to that user, etc. > However, when we attempt to run a YARN application master, the localization > step fails when setting up the local cache directory for the AM. The error > that comes out of the RM logs: > 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: > ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, > diagnostics='Application application_1429295486450_0001 failed 1 times due to > AM Container for appattempt_1429295486450_0001_01 exited with exitCode: > -1000 due to: Application application_1429295486450_0001 initialization > failed (exitCode=255) with output: main : command provided 0 > main : user is DOMAIN\hadoopuser > main : requested yarn user is domain\hadoopuser > org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create > directory: > /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 > at > org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) > .Failing this attempt.. Failing the application.' > However, when we look on the node launching the AM, we see this: > [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache > [root@rpb-cdh-kerb-2 usercache]# ls -l > drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser > There appears to be different treatment of the \ character in different > places. Something creates the directory as "domain\hadoopuser" but something > else later attempts to use it as "domain%5Chadoopuser". I’m not sure where > or why the URL escapement converts the \ to %5C or why this is not consistent. > I should also mention, for the sake of completeness, our auth_to_local rule > is set up to map u...@domain.com to domain\user: > RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3538) TimelineServer doesn't catch/translate all exceptions raised
[ https://issues.apache.org/jira/browse/YARN-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-3538: Labels: oct16-easy (was: BB2015-05-TBR) > TimelineServer doesn't catch/translate all exceptions raised > > > Key: YARN-3538 > URL: https://issues.apache.org/jira/browse/YARN-3538 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: oct16-easy > Attachments: YARN-3538-001.patch > > > Not all exceptions in TimelineServer are uprated to web exceptions; only IOEs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5704) Provide config knobs to control enabling/disabling new/work in progress features in container-executor
[ https://issues.apache.org/jira/browse/YARN-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576609#comment-15576609 ] Chris Douglas commented on YARN-5704: - [~vvasudev] would you mind taking a look at YARN-5719 so we can enforce C99 (or whatever) for CE? > Provide config knobs to control enabling/disabling new/work in progress > features in container-executor > -- > > Key: YARN-5704 > URL: https://issues.apache.org/jira/browse/YARN-5704 > Project: Hadoop YARN > Issue Type: Task > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: YARN-5704-branch-2.8.001.patch, YARN-5704.001.patch > > > Provide a mechanism to enable/disable Docker and TC (Traffic Control) > functionality at the container-executor level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573147#comment-15573147 ] Chris Douglas commented on YARN-5719: - [~aw] would you mind taking a look? > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-5719: Comment: was deleted (was: [~aw] would you mind taking a look?) > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573148#comment-15573148 ] Chris Douglas commented on YARN-5719: - [~aw] would you mind taking a look? > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-5719: Assignee: (was: Chris Douglas) > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5704) Provide config knobs to control enabling/disabling new/work in progress features in container-executor
[ https://issues.apache.org/jira/browse/YARN-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563755#comment-15563755 ] Chris Douglas commented on YARN-5704: - bq. If we want to declare this code base as being C99 then we need to tell cmake to make sure we're using a C99 compiler. Until we do that, this code is defaulting to non-C99. OK, got it. I don't suppose NoC99 has the same cachet as NoSQL? Let's pick a standard. Filed YARN-5719 bq. telling cmake that we're doing C99 is sort of a mine field, depending upon which version of cmake is in use. Took a look at this and... yikes. > Provide config knobs to control enabling/disabling new/work in progress > features in container-executor > -- > > Key: YARN-5704 > URL: https://issues.apache.org/jira/browse/YARN-5704 > Project: Hadoop YARN > Issue Type: Task > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-5704-branch-2.8.001.patch, YARN-5704.001.patch > > > Provide a mechanism to enable/disable Docker and TC (Traffic Control) > functionality at the container-executor level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563749#comment-15563749 ] Chris Douglas commented on YARN-5719: - There's a convenient [option|https://cmake.org/cmake/help/v3.1/prop_tgt/C_STANDARD.html] in recent versions of cmake to set the C standard in a portable way, but this is unavailable in the minimum version of cmake we require (2.6). v000 uses a set of switches based on a subset of [compiler ids|https://cmake.org/cmake/help/v3.0/variable/CMAKE_LANG_COMPILER_ID.html] we're likely(?) to support. The options themselves I pulled from cursory searches; I haven't tested with anything but gcc 4.8.4. The LCE doesn't compile with ANSI C ({{-std=c89}}), but required almost no changes with C99. The only change with {{-pedantic-errors}} required some minor tweaks to {{get_user_info}}. > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5719) Enforce a C standard for native container-executor
[ https://issues.apache.org/jira/browse/YARN-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-5719: Attachment: YARN-5719.000.patch > Enforce a C standard for native container-executor > -- > > Key: YARN-5719 > URL: https://issues.apache.org/jira/browse/YARN-5719 > Project: Hadoop YARN > Issue Type: Task > Components: nodemanager >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-5719.000.patch > > > The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5719) Enforce a C standard for native container-executor
Chris Douglas created YARN-5719: --- Summary: Enforce a C standard for native container-executor Key: YARN-5719 URL: https://issues.apache.org/jira/browse/YARN-5719 Project: Hadoop YARN Issue Type: Task Components: nodemanager Reporter: Chris Douglas Assignee: Chris Douglas The {{container-executor}} build should declare the C standard it uses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5704) Provide config knobs to control enabling/disabling new/work in progress features in container-executor
[ https://issues.apache.org/jira/browse/YARN-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15561309#comment-15561309 ] Chris Douglas commented on YARN-5704: - Thanks for working on this, [~sidharta-s]. As part of the followup patch, please also avoid using {{strcat}} when printing the usage can be separated into multiple statements. Avoids allocating a buffer we need to track for overflow. Sorry, I hadn't noticed that earlier. bq. If we take that to it's logical conclusion we just declare all of our utility functions as static and remove all the unit tests. That takes this heuristic well past its logical conclusion, but it'll be addressed in YARN-5717. bq. [Variable declaration in the middle] Just because the old code follows bad practices doesn't mean that new code should. c-e not being ANSI C compliant is a problem, BTW. If this creates portability problems that makes sense, though VS is the only C compiler I know of that (until recently?) doesn't support most of C99. Initializing variables when they're declared can avoid accidents, particularly over long LCE methods. Are any platforms this could target restricted to ANSI C compilers? Requiring that new patches use ANSI C, without making the rest of LCE compliant, adds a touchy manual step for committers and helps no users. If there are restrictions on the subset of C this should use, the compiler needs to enforce them. > Provide config knobs to control enabling/disabling new/work in progress > features in container-executor > -- > > Key: YARN-5704 > URL: https://issues.apache.org/jira/browse/YARN-5704 > Project: Hadoop YARN > Issue Type: Task > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2 >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-5704-branch-2.8.001.patch, YARN-5704.001.patch > > > Provide a mechanism to enable/disable Docker and TC (Traffic Control) > functionality at the container-executor level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5702) Refactor TestPBImplRecords so that we can reuse for testing protocol records in other YARN modules
[ https://issues.apache.org/jira/browse/YARN-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-5702: Fix Version/s: 2.9.0 > Refactor TestPBImplRecords so that we can reuse for testing protocol records > in other YARN modules > -- > > Key: YARN-5702 > URL: https://issues.apache.org/jira/browse/YARN-5702 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: YARN-5702-v1.patch, YARN-5702-v2.patch > > > The {{TestPBImplRecords}} has generic helper methods to validate YARN api > records. This JIRA proposes to refactor the generic helper methods into a > base class that can then be reused by other YARN modules for testing internal > API protocol records like in yarn-server-common for Federation (YARN-2915). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524603#comment-15524603 ] Chris Douglas commented on YARN-5621: - That summary of work seems about right, thanks for putting it together. You raise excellent points about error handling. Your sketch includes a channel communicating which resources were (un)successfully linked. The script-driven approach handles this in v05 by writing a separate bash script and invoking the CE for each symlink (which, to be fair, isn't exactly "lightweight" when compared to extending {{ContainerLocalizer}}). In v05, a failure affects only one resource, but to take your earlier example linking a batch of resources in the script: how would one handle partial failures? What's the state of the container and resources when the script invocation fails? On the CL proposal: either the CI initiates the symlink request to the {{ResourceLocalizationService}} after download, or the two operations are contained within that service. The complexity is comparable. The 2-phase protocol you sketch (CI initiates download, then link) adds a gap when the CL could be shut down before it receives the {{LINK}} commands (causing two CL launches), but even a short timeout would likely cover that. A single-message annotating the resource (download+symlink) could add states to {{LocalizedResource}} if it were to notify starting containers directly (current code) or handoff to the RLS for symlink. In this case, the protocol to the {{ContainerImpl}} is simpler (resending/retry is idempotent b/c it doesn't care if the download or symlink failed). Both {{FetchSuccessTransition}} and {{LocalizedResourceTransition}} would need to send {{LocalizerResourceRequestEvent}} for running containers to symlink. A failed symlink would look like a failed download to the CI. Start container is unaffected. For the CL itself... sure, {{ResourceLocalizationSpec}} needs an another field for symlinks. This side is pretty straightforward, right? > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510866#comment-15510866 ] Chris Douglas commented on YARN-5621: - bq. I think I understand your approach now, basically, [...] Yes, that's the gist of it. The {{ContainerLocalizer}} manages the private cache as the user, with that user's cluster credentials. Running containers start a CL to download private resources and/or create symlinks. bq. Is it starting both instances now? Not sure if I read the code wrong... It seems not the case. Based on the code, if it's an already existing resource, it will NOT start the ContainerLocalizer. [container start] For different applications? It should. For container start, I don't remember offhand if the {{ContainerLocalizer}} spawn is delayed until at least one dependent resource is not claimed, but IIRC it starts if at least one resource is not downloaded. Either way, CLs could start in race for a resource _R_, and only one would (successfully) download it. Resources aren't claimed when the CL launches, only when it heartbeats in. [CL proposal] For running container localization and for rollback, the CL will download the resource (again) and/or create the symlink to the running container. If multiple containers/applications request the same resource, it doesn't matter if it's a mix of new/running containers requesting a resource _R_. Only running/rollback containers will send symlink commands to their CL. bq. This approach may not be easily worked for the new containers without structural change, when localizer is started, the work-dirs are not setup yet. Again, container start is unaffected; new containers will not send {{LINK}} commands to the CL. Only _running_ containers will start a CL that receives {{LINK}} commands, after the work dirs are created and the container has started. > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507320#comment-15507320 ] Chris Douglas commented on YARN-5621: - I think I see where the CL proposal was unclear. It is an alternative to CE changes; container start remains as-is. The proposal was scoped only to localizing resources for running containers. The CE is agnostic to new/running containers for an application- it may be used by both, concurrently. By adding a new command {{LINK}} to its protocol, the NM can instruct the {{ContainerLocalizer}} to create a symlink to a resource for a running container. Again, these commands could be grouped. {quote} > a case that already exists for containers on the same node requesting the > same resource Do you mean this is an existing implemented functionality or this is an existing use-case? {quote} Neither. The case where running containers (c ~1x~, c ~2y~) for different applications (a ~1~, a ~2~) request the same resource _R_ exists. Both will start {{ContainerLocalizer}} instances, but only one will download the resource to the private cache. In the CL proposal, this is the same as rollback, where the CL starts, heartbeats, then receives a command to LINK an existing resource without downloading anything. By "a case that already exists", I meant it's a case the CL proposal handles implicitly. bq. yeah, I feel it's inefficient to start a localizer process to only create symlinks.. No question. But if localizing a new resource takes a few seconds, for services that upgrade over minutes/hours, then a few hundred milliseconds is not worth adding {{RUN_SCRIPT}} to the CE. > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504180#comment-15504180 ] Chris Douglas commented on YARN-5621: - bq. this approach will not work in rollback scenario, as in that case no resources need to be localized - hence, no need to start the localizer processes. We only need to update the symlinks to old resources. Sorry, I'm missing something. If the {{ContainerLocalizer}} supports a command to create symlinks to localized resources- a case that already exists for containers on the same node requesting the same resource- then how is that case distinguished from rollback? The container does need to start a {{ContainerLocalizer}} just to write some symlinks for the running container, which is inefficient. On the other hand, all symlinks for all containers from an application could be updated in the same invocation. When you say it does not work, are you noting the inefficiency of this flow, or is there a correctness problem? > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491452#comment-15491452 ] Chris Douglas commented on YARN-5621: - bq. This may be a viable approach, we need to change the localizer heartbeat to send the symlink path. The heartbeat already carries a payload with commands to the localizer. Including actions to symlink resources already fetched isn't that dire a change to either the ContainerLocalizer or the resource state machine, is it? The transition needs to send a LINK request to all localizers that were waiting in case the download failed. bq. But if we want to create all symlinks in one go, this approach will not work. This isn't going to be a transaction on the FS regardless, but can you explain this requirement? If symlink-on-download is disqualifying, then the container could still coordinate grouped symlinks by grouping LINK requests to a localizer. It rearranges the event flows awkwardly, but it's supportable... > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch, > YARN-5621.4.patch, YARN-5621.5.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5547) NMLeveldbStateStore should be more tolerant of unknown keys
[ https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487715#comment-15487715 ] Chris Douglas commented on YARN-5547: - bq. Skipping the container entirely would be very bad. The NM would not recover it, so it would then stop reporting it in heartbeats and the RM would then think it is dead/lost, but the container is actually still running, unmonitored and unkillable by YARN. Agreed. What we were discussing was making container recovery independent, so containers using unknown features are not recovered, but failed and killed. The base case should recover nothing- all containers should be killed and cleaned up- but the NM should always start. I'm not sure every feature is neatly classified in the mandatory/optional taxonomy, particularly since many will depend on the version of the client and RM. It seems simpler (and safer) to always kill/clean up containers using features the NM doesn't understand. > NMLeveldbStateStore should be more tolerant of unknown keys > --- > > Key: YARN-5547 > URL: https://issues.apache.org/jira/browse/YARN-5547 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Ajith S > Attachments: YARN-5547.01.patch > > > Whenever new keys are added to the NM state store it will break rolling > downgrades because the code will throw if it encounters an unrecognized key. > If instead it skipped unrecognized keys it could be simpler to continue > supporting rolling downgrades. We need to define the semantics of > unrecognized keys when containers and apps are cleaned up, e.g.: we may want > to delete all keys underneath an app or container directory when it is being > removed from the state store to prevent leaking unrecognized keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478543#comment-15478543 ] Chris Douglas commented on YARN-5621: - bq. FWIW, I'd love to see us drop the container launch script. I haven't tried it, but I suspect we can do lots of fun things with the env vars. For all containers, we have (1) NM constants (2) some user args we verify (e.g., container ID matches the token, is correctly formatted, etc.) used as args to the CE (which should validate that each of these args conforms to a schema). These are the args used to build paths. All other args (3) the user can control should be written to the container launch script, which is executed with the same permissions the container would have. The intent was to have all quoting games happen after we've switched to the user's context, and after we've discarded the NM environment. The implementation may have gaps, but is there a problem with the concept? This JIRA follows a similar pattern, but without validation of args in the CE. If it were restricted s.t. the source had a fixed format in {{nmPrivate}} and the destination was derived from a formatted {{ContainerID}}, it could have comparable guarantees as the container start. Unless the resource is public, could this avoid modifying the CE by moving the symlink to the {{ContainerLocalizer}}? It could receive a symlink command on a heartbeat, it's already running as the user, it may already be running to download the resource... > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks for continuously localized resources
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477639#comment-15477639 ] Chris Douglas commented on YARN-5621: - bq. Because the passed in symlink path is an absolute path Yes, obviously. :) I'm asking why this is an absolute path, if (per the design doc) the symlink is still relative to the container's working dir. bq. later on we need to create multiple symlinks in a single operation as done in current container_launch script, because if there is a large number of local Resources to be localized, we don't want to invoke the binary for each of them. Invoking the binary for each resource isn't so dire. Linking a group of resources only if they're all successfully localized could be useful for services/upgrades, though. bq. I guess the question is why the original container_launch script is not done in this way? I think Allen's point is that the TC/CE binaries have avoided abstraction and other conventional good taste to reduce the attack surface. If the CE can only run scripts that were written by the NM to a specific, restricted directory, it can only run them as the user in a destination following the NM schema, etc. that makes it harder to involve the CE in an attack. If the CE can invoke one stage without preconditions guaranteed by the previous stage, as {{--run-script}} may allow, that's substantively different from the existing behavior. > Support LinuxContainerExecutor to create symlinks for continuously localized > resources > -- > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5621) Support LinuxContainerExecutor to create symlinks
[ https://issues.apache.org/jira/browse/YARN-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15475198#comment-15475198 ] Chris Douglas commented on YARN-5621: - Is the patch intended for another JIRA, or is the title too narrowly phrased? I haven't gone through the patch in detail, but a RUN_SCRIPT action is a very general mechanism for a specific function (LCE already supports symlink, right?). Why relax this constraint? {noformat} - if (dst.isAbsolute()) { -throw new IOException("Destination must be relative"); - } {noformat} > Support LinuxContainerExecutor to create symlinks > - > > Key: YARN-5621 > URL: https://issues.apache.org/jira/browse/YARN-5621 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5621.1.patch, YARN-5621.2.patch, YARN-5621.3.patch > > > When new resources are localized, new symlink needs to be created for the > localized resource. This is the change for the LinuxContainerExecutor to > create the symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5121) fix some container-executor portability issues
[ https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400276#comment-15400276 ] Chris Douglas commented on YARN-5121: - +1 from me. Thanks, Allen for the patch and ChrisN for review. bq. I did remove some other debugging code, but that one I thought was useful due to aggressive use of ternary operators I haven't looked at the context, but if {{ret}} can never be null in that case ({{real_fname}} is never null?), then the tenary operator is redundant. If it can be null, then the new debug stmt can cause a segfault before it prints? Nit-picking in any case. > fix some container-executor portability issues > -- > > Key: YARN-5121 > URL: https://issues.apache.org/jira/browse/YARN-5121 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Attachments: YARN-5121.00.patch, YARN-5121.01.patch, > YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, > YARN-5121.06.patch, YARN-5121.07.patch > > > container-executor has some issues that are preventing it from even compiling > on the OS X jenkins instance. Let's fix those. While we're there, let's > also try to take care of some of the other portability problems that have > crept in over the years, since it used to work great on Solaris but now > doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5164) Use plan RLE to improve CapacityOverTimePolicy efficiency
[ https://issues.apache.org/jira/browse/YARN-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-5164: Summary: Use plan RLE to improve CapacityOverTimePolicy efficiency (was: CapacityOvertimePolicy does not take advantaged of plan RLE) > Use plan RLE to improve CapacityOverTimePolicy efficiency > - > > Key: YARN-5164 > URL: https://issues.apache.org/jira/browse/YARN-5164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5164-example.pdf, YARN-5164-inclusive.4.patch, > YARN-5164-inclusive.5.patch, YARN-5164.1.patch, YARN-5164.2.patch, > YARN-5164.5.patch, YARN-5164.6.patch, YARN-5164.7.patch, YARN-5164.8.patch > > > As a consequence small time granularities (e.g., 1 sec) and long time horizon > for a reservation (e.g., months) run rather slow (10 sec). > Proposed resolution is to switch to interval math in checking, similar to how > YARN-4359 does for agents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5121) fix some container-executor portability issues
[ https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383451#comment-15383451 ] Chris Douglas commented on YARN-5121: - +1 overall, though I haven't tested it on multiple platforms. Thanks for also updating the L Minor: * Leftover debug stmt in {{configuration.c}}? {noformat} +fprintf(stderr, "fn=%s\n",file_name); strncpy(strrchr(buffer, '/') + 1, file_name, EXECUTOR_PATH_MAX); real_fname = buffer; +fprintf(stderr, "real_fname=%s\n",real_fname); {noformat} * In {{container-executor.c}}, should "Error signalling process group %d with signal %d - %s\n" go to LOGFILE instead of stderr? * -0 on the whitespace fixes... I'd prefer to keep the history, but the patch touches enough code that it may be worthwhile. > fix some container-executor portability issues > -- > > Key: YARN-5121 > URL: https://issues.apache.org/jira/browse/YARN-5121 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Attachments: YARN-5121.00.patch, YARN-5121.01.patch, > YARN-5121.02.patch, YARN-5121.03.patch > > > container-executor has some issues that are preventing it from even compiling > on the OS X jenkins instance. Let's fix those. While we're there, let's > also try to take care of some of the other portability problems that have > crept in over the years, since it used to work great on Solaris but now > doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5164) CapacityOvertimePolicy does not take advantaged of plan RLE
[ https://issues.apache.org/jira/browse/YARN-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375971#comment-15375971 ] Chris Douglas commented on YARN-5164: - Only minor nits, otherwise +1: {{CapacityOverTimePolicy}} - Avoid importing java.util.\* - Where the intermediate points are added, the code would be more readable if the key were assigned to a named variable (instead of multiple calls to {{e.getKey()}}). Same with the point-wise integral computation - checkstyle (spacing): {{+ if(e.getValue()!=null) {}} - A comment briefly sketching the algorithm would help future maintainers {{NoOverCommitPolicy}} - The exception message should be reformatted (some redundant string concats) and omit references to the time it no longer reports - Should the {{PlanningException}} be added as a cause, rather than concatenated with the ReservationID? > CapacityOvertimePolicy does not take advantaged of plan RLE > --- > > Key: YARN-5164 > URL: https://issues.apache.org/jira/browse/YARN-5164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-5164-example.pdf, YARN-5164-inclusive.4.patch, > YARN-5164-inclusive.5.patch, YARN-5164.1.patch, YARN-5164.2.patch, > YARN-5164.5.patch, YARN-5164.6.patch > > > As a consequence small time granularities (e.g., 1 sec) and long time horizon > for a reservation (e.g., months) run rather slow (10 sec). > Proposed resolution is to switch to interval math in checking, similar to how > YARN-4359 does for agents. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5132) Exclude generated protobuf sources from YARN Javadoc build
[ https://issues.apache.org/jira/browse/YARN-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300822#comment-15300822 ] Chris Douglas commented on YARN-5132: - >From the bug [~subru] cited, it looks like there's no solution for the javadoc >warnings in 2.5, a version we're unlikely to change and Google is unlikely to >fix. [~aw], I think your point is that Jenkins should stop complaining about (new) javadoc warnings in generated code, rather than giving up generating javadoc entirely. The protobuf classes are public APIs, but they're not user-facing in our Java APIs... I'm pretty ambivalent about keeping javadoc for them; including it may mislead someone into writing against them, rather than the API classes. Since (IIRC) we exclude other \@Private APIs from the generated javadoc, this seems like a good change, overall. Unless there's a better way to effect it? > Exclude generated protobuf sources from YARN Javadoc build > -- > > Key: YARN-5132 > URL: https://issues.apache.org/jira/browse/YARN-5132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Critical > Attachments: YARN-5132-v1.patch > > > Currently YARN build includes Javadoc from generated protobuf sources which > is causing CI to fail. This JIRA proposes to exclude generated protobuf > sources from YARN Javadoc build -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2883) Queuing of container requests in the NM
[ https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196751#comment-15196751 ] Chris Douglas commented on YARN-2883: - Concerning v004: * The use of {{getRemoteUgi}} in {{QueuingContainerManagerImpl::stopContainerInternalIfNotQueued}} may be unnecessary, neither will it work as expected. The check for user credentials will likely use the UGI from the {{EventDispatcher}}, not the RPC call that initiated it (in {{stopContainerInternalIfNotQueued}}). Setting the cause to {{KILLED_BY_APPMASTER}} may be inappropriate if queued containers could be killed for other reasons. * If an application completes, its queued containers should be cleared. * In {{getContainerStatusInternal}}, if the {{ConcurrentMap}} is necessary, then it should call {{get()}} once on the instance rather than {{containsKey()}}/{{get()}} * Rather than adding null checks for a disabled queuing context, this could support a null context that effectively disables the queuing logic (as in {{NodeStatusUpdaterImpl}}) * It seems the queuing is not fair. New containers are started immediately, without checking if the queue is empty. However, if the queue contains any entries, they should have started from {{onStopMonitoringContainer}}. With a large container at the front of the queue, smaller, queued containers will not get a chance to run while new, small containers will. * The queue should be bounded in some way. Minor * {{NMContext}} can set the queuing context as final, rather than a separate {{setQueuingContext}}, which is not threadsafe as written. * I didn't look through the test code in detail, but the {{DeletionService}} sleeping for 10s seems odd * New loggers should use slf4j, and the {{LOG.level("Text {}", arg)}} syntax rather than {{isLevelEnabled()}} * The default case of {{QueuingContainerManagerImpl::handle}} should throw * {{0.f}} is a valid literal? * {{killOpportContainers}} may want to log a warning if killing opportunistic containers is insufficient to satisfy the contract (after the loop). This would be helpful when debugging. * Do {{queuedGuarRequests}} and {{queuedOpportRequests}} need to be synchronized? Or is the handler sufficient? * {{QueuingContainersMonitorImpl::AllocatedContainerInfo}} could define equals/hashcode and use {{Collection::remove}} instead of defining {{removeContainerFromQueue}} > Queuing of container requests in the NM > --- > > Key: YARN-2883 > URL: https://issues.apache.org/jira/browse/YARN-2883 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: YARN-2883-trunk.004.patch, > YARN-2883-yarn-2877.001.patch, YARN-2883-yarn-2877.002.patch, > YARN-2883-yarn-2877.003.patch, YARN-2883-yarn-2877.004.patch > > > We propose to add a queue in each NM, where queueable container requests can > be held. > Based on the available resources in the node and the containers in the queue, > the NM will decide when to allow the execution of a queued container. > In order to ensure the instantaneous start of a guaranteed-start container, > the NM may decide to pre-empt/kill running queueable containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173427#comment-15173427 ] Chris Douglas commented on YARN-4734: - bq. For merge it at the top level, did you mean LICENSE.txt and BUILDING.txt? Are there any other files I need to change? {{NOTICE.txt}} may also need to be updated. No worries on the WIP, we can do a pass on the docs when it's ready to merge. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk
[ https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173255#comment-15173255 ] Chris Douglas commented on YARN-4734: - {{LICENSE.txt}} looks like it is based on, or copied from Apache Tez. Could you double-check the set of modules to ensure it's correct for Hadoop? We'll also need to merge it at the top level. > Merge branch:YARN-3368 to trunk > --- > > Key: YARN-4734 > URL: https://issues.apache.org/jira/browse/YARN-4734 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch > > > YARN-2928 branch is planned to merge back to trunk shortly, it depends on > changes of YARN-3368. This JIRA is to track the merging task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103350#comment-15103350 ] Chris Douglas edited comment on YARN-4597 at 1/16/16 6:52 PM: -- Thanks, Arun. Please feel free to take this over. It's only justified in context with these other changes. was (Author: chris.douglas): Thanks, Arun. Please feel free > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Chris Douglas > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103350#comment-15103350 ] Chris Douglas commented on YARN-4597: - Thanks, Arun. Please feel free > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL: https://issues.apache.org/jira/browse/YARN-4597 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Chris Douglas > > Currently, the NM immediately launches containers after resource > localization. Several features could be more cleanly implemented if the NM > included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4597) Add SCHEDULE to NM container lifecycle
Chris Douglas created YARN-4597: --- Summary: Add SCHEDULE to NM container lifecycle Key: YARN-4597 URL: https://issues.apache.org/jira/browse/YARN-4597 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Chris Douglas Currently, the NM immediately launches containers after resource localization. Several features could be more cleanly implemented if the NM included a separate stage for reserving resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4597) Add SCHEDULE to NM container lifecycle
[ https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101149#comment-15101149 ] Chris Douglas commented on YARN-4597: - The {{ContainerLaunchContext}} (CLC) specifies the prerequisites for starting a container on a node. These include setting up user/application directories and downloading dependencies to the NM cache (localization). The NM assumes that an authenticated {{startContainer}} request has not overbooked resources on the node, so resources are only reserved/enforced during the container launch and execution. This JIRA proposes to add a phase between localization and container launch to manage a collection of runnable containers. Similar to the localizer stage, a container will launch only after all the resources from its CLC are assigned by a _local scheduler_. The local scheduler will select containers to run based on priority, declared requirements, and by monitoring utilization on the node (YARN-1011). A few future and in-progress features motiviate this change. *Preemption* Instead of sending a kill when the RM selects a victim container, it could instead convert it from a {{GUARANTEED}} to an {{OPTIMISTIC}} container (YARN-4335). This has two benefits. First, the downgraded container can continue to run until a guaranteed container arrives _and_ finishes localizing its dependencies, so the downgraded container has an opportunity to complete or checkpoint. When the guaranteed container moves from {{LOCALIZED}} to {{SCHEDULING}}, the local scheduler may select the victim (formerly guaranteed) container to be killed. \[1\] Second, the NM may elect to kill the victim container to run _different_ optimistic containers, particularly short-running tasks. *Optimistic scheduling and overprovisioning* To support distributed scheduling (YARN-2877) and resource-aware scheduling (YARN-1011), the NM needs a component to select containers that are ready to run. The local scheduler can not only select tasks to run based on monitoring, it can also make offers to running containers using durations attached to leases \[2\]. Based on recent observations, it may start containers that oversubscribe the node, or delay starting containers if a lease is close to expiring (i.e., the container is likely to complete). *Long-running services*. Note that by separating the local scheduler, both that module _and_ the localizer could be opened up as services provided by the NM. The localizer could also be extended to prioritize downloads among {{OPTIMISTIC}} containers (possibly preemptable by {{GUARANTEED}}, and to group containers based on their dependencies (e.g., avoid downloading a large dep for fewer than N optimistic containers). By exposing these services, the NM can assist with the following: # Resource spikes. If a service container needs to spike temporarily, it may not need guaranteed resources (YARN-1197). Containers requiring low-latency elasticity could request optimistic resources instead of peak provisioning, resizing, or using workarounds like [Llama|http://cloudera.github.io/llama/]. If the local scheduler is addressable by local containers, then the lease could be logical (i.e., not start a process). Resources assigned to a {{RUNNING}} container could be published rather than triggering a launch. One could also imagine service workers marking some resources as unused, while retaining the authority to spike into them ("subleasing" them to opportunistic containers) by reclaiming them through the local scheduler. # Upgrades. If the container needs to pull new dependencies, it could use the NM Localizer rather of coordinating the download itself. # Maintenance tasks. Services often need to clean up, compact, scrub, and checkpoint local data. Right now, each service needs to independnetly monitor resource utilization to back off saturated resources (particularly disks). Coordination between services is difficult. In contrast, one could schedule tasks like block scrubbing as optimistic tasks in the NM to avoid interrupting services that are spiking. This is similar in spirit to distributed scheduling insofar as it does not involve the RM and targets a single host (i.e., the host the container is running on). \[1\] Though it was selected as a victim by the RM, the local scheduler may decide to kill a different {{OPTIMISTIC}} container when the guaranteed container requests resources. For example, if a container completes on the node after the RM selected the victim, then the NM may elect to kill a smaller optimistic process if it is sufficient to satisfy the guarantee. \[2\] Discussion on duration in YARN-1039 was part of a broader conversation on support for long-running services (YARN-896). > Add SCHEDULE to NM container lifecycle > -- > > Key: YARN-4597 > URL:
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476-2.patch Fixed more checkstyle warnings. Diminishing returns on the remainder... > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch, YARN-4476-2.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065210#comment-15065210 ] Chris Douglas edited comment on YARN-4476 at 12/19/15 3:40 AM: --- bq. do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? I thought about it, but: # It's a (potential) internal detail of the node label implementation, with other classes in the package # The {{nodelabels}} package is sparse right now # None of these classes are user-facing, so they're easy to move So I put in in the {{nodelabels}} package, but don't have a strong opinion. was (Author: chris.douglas): bq. do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? I thought about it, but: # It's a (potential) internal detail of the node label implementation, with other classes in the package # The {{nodelabel}} package is sparse right now # None of these classes are user-facing, so they're easy to move So I put in in the {{nodelabels}} package, but don't have a strong opinion. > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065210#comment-15065210 ] Chris Douglas commented on YARN-4476: - bq. do you think is it better to place this module to org.apache.hadoop.yarn.server.resourcemanager.nodelabels.matcher (or evaluator) for better organization? I thought about it, but: # It's a (potential) internal detail of the node label implementation, with other classes in the package # The {{nodelabel}} package is sparse right now # None of these classes are user-facing, so they're easy to move So I put in in the {{nodelabels}} package, but don't have a strong opinion. > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476-1.patch Add ASF license headers, fix findbugs warnings, address some of the checkstyle issues. > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch, YARN-4476-1.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4476) Matcher for complex node label expresions
Chris Douglas created YARN-4476: --- Summary: Matcher for complex node label expresions Key: YARN-4476 URL: https://issues.apache.org/jira/browse/YARN-4476 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Chris Douglas Assignee: Chris Douglas Implementation of a matcher for complex node label expressions based on a [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4476) Matcher for complex node label expresions
[ https://issues.apache.org/jira/browse/YARN-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4476: Attachment: YARN-4476-0.patch > Matcher for complex node label expresions > - > > Key: YARN-4476 > URL: https://issues.apache.org/jira/browse/YARN-4476 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Chris Douglas >Assignee: Chris Douglas > Attachments: YARN-4476-0.patch > > > Implementation of a matcher for complex node label expressions based on a > [paper|http://dl.acm.org/citation.cfm?id=1807171] from SIGMOD 2010. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"
[ https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063361#comment-15063361 ] Chris Douglas commented on YARN-4195: - Posted a draft to YARN-4476 > Support of node-labels in the ReservationSystem "Plan" > -- > > Key: YARN-4195 > URL: https://issues.apache.org/jira/browse/YARN-4195 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4195.patch > > > As part of YARN-4193 we need to enhance the InMemoryPlan (and related > classes) to track the per-label available resources, as well as the per-label > reservation-allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"
[ https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063282#comment-15063282 ] Chris Douglas commented on YARN-4195: - bq. a better version of this, which uses a cool algorithm which skips the conversion to DNF The impl is based on a SIGMOD 2010 [paper|http://dl.acm.org/citation.cfm?id=1807171] that converts boolean expressions to intervals. I'll adapt it for Hadoop and post a patch > Support of node-labels in the ReservationSystem "Plan" > -- > > Key: YARN-4195 > URL: https://issues.apache.org/jira/browse/YARN-4195 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4195.patch > > > As part of YARN-4193 we need to enhance the InMemoryPlan (and related > classes) to track the per-label available resources, as well as the per-label > reservation-allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049364#comment-15049364 ] Chris Douglas edited comment on YARN-4358 at 12/9/15 9:15 PM: -- [~asuresh], you need not update the Javadoc of {{getReservationById}}. The problem is caused because we are specifying *Set* inside {{\{@ link\}}} so the fix should be just be to update the Javadoc of the return parameter of {{getReservations}} to: {{@return set of active \{\@link ReservationAllocation\}s for the specified user at the requested time}} was (Author: subru): [~asuresh], you need not update the Javadoc of _getReservationById_. The problem is caused because we are specifying *Set* inside _{@ link}_ so the fix should be just be to update the Javadoc of the return parameter of _getReservations_ to: bq @return set of active {@link ReservationAllocation}s for the specified user at the requested time > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.4.patch, > YARN-4358.addendum.patch, YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4248) REST API for submit/update/delete Reservations
[ https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated YARN-4248: Attachment: YARN-4248-asflicense.patch > REST API for submit/update/delete Reservations > -- > > Key: YARN-4248 > URL: https://issues.apache.org/jira/browse/YARN-4248 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, > YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch > > > This JIRA tracks work to extend the RMWebService to support REST APIs to > submit/update/delete reservations. This will ease integration with external > tools that are not java-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations
[ https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047395#comment-15047395 ] Chris Douglas commented on YARN-4248: - Pushed to trunk, branch-2, branch-2.8. Sorry to have missed these in review. Not sure why it wasn't flagged by test-patch. > REST API for submit/update/delete Reservations > -- > > Key: YARN-4248 > URL: https://issues.apache.org/jira/browse/YARN-4248 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, > YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch > > > This JIRA tracks work to extend the RMWebService to support REST APIs to > submit/update/delete reservations. This will ease integration with external > tools that are not java-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations
[ https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047576#comment-15047576 ] Chris Douglas commented on YARN-4248: - Thanks, Chris. > REST API for submit/update/delete Reservations > -- > > Key: YARN-4248 > URL: https://issues.apache.org/jira/browse/YARN-4248 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4248-asflicense.patch, YARN-4248.2.patch, > YARN-4248.3.patch, YARN-4248.5.patch, YARN-4248.6.patch, YARN-4248.patch > > > This JIRA tracks work to extend the RMWebService to support REST APIs to > submit/update/delete reservations. This will ease integration with external > tools that are not java-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4248) REST API for submit/update/delete Reservations
[ https://issues.apache.org/jira/browse/YARN-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045438#comment-15045438 ] Chris Douglas commented on YARN-4248: - +1 lgtm If it's appropriate for this to go into 2.8, set the target version and post a notification on the release thread. > REST API for submit/update/delete Reservations > -- > > Key: YARN-4248 > URL: https://issues.apache.org/jira/browse/YARN-4248 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4248.2.patch, YARN-4248.3.patch, YARN-4248.5.patch, > YARN-4248.6.patch, YARN-4248.patch > > > This JIRA tracks work to extend the RMWebService to support REST APIs to > submit/update/delete reservations. This will ease integration with external > tools that are not java-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)