[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021630#comment-15021630 ] lachisis commented on YARN-4382: If lots of container hierarchys remained, it will make the cpu busy of this node, even when no jobs are running. -- PerfTop: 129889 irqs/sec kernel:76.3% [10 cycles], (all, 16 CPUs) -- samplespcnt kernel function ___ _ ___ 117166.00 - 59.1% : tg_shares_up 35688.00 - 18.0% : _spin_lock_irqsave 12045.00 - 6.1% : __set_se_shares > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
[ https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021666#comment-15021666 ] Jun Gong commented on YARN-4382: 'release_agent' in cgroups ([https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt|https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt]) might help for this case. Maybe we could use it to remove empty dirs? SLURM has used it ([http://slurm.schedmd.com/cgroups.html|http://slurm.schedmd.com/cgroups.html]). {quote} If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the "release_agent" file in that hierarchy's root directory, supplying the pathname (relative to the mount point of the cgroup file system) of the abandoned cgroup. This enables automatic removal of abandoned cgroups. The default value of notify_on_release in the root cgroup at system boot is disabled (0). The default value of other cgroups at creation is the current value of their parents' notify_on_release settings. The default value of a cgroup hierarchy's release_agent path is empty. {quote} > Container hierarchy in cgroup may remain for ever after the container have be > terminated > > > Key: YARN-4382 > URL: https://issues.apache.org/jira/browse/YARN-4382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.2 >Reporter: lachisis > > If we use LinuxContainerExecutor to executor the containers, this question > may happens. > In the common case, when a container run, a corresponding hierarchy will be > created in cgroup dir. And when the container terminate, the hierarchy will > be delete in some seconds(this time can be configured by > yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). > In the code, I find that, CgroupsLCEResource send a signal to kill container > process asynchronously, and in the same time, it will try to delete the > container hierarchy in configured "delete-delay-ms" times. > But if the container process be killed for seconds which large than > "delete-delay-ms" time, the container hierarchy will remain for ever. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated
lachisis created YARN-4382: -- Summary: Container hierarchy in cgroup may remain for ever after the container have be terminated Key: YARN-4382 URL: https://issues.apache.org/jira/browse/YARN-4382 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.2 Reporter: lachisis If we use LinuxContainerExecutor to executor the containers, this question may happens. In the common case, when a container run, a corresponding hierarchy will be created in cgroup dir. And when the container terminate, the hierarchy will be delete in some seconds(this time can be configured by yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms). In the code, I find that, CgroupsLCEResource send a signal to kill container process asynchronously, and in the same time, it will try to delete the container hierarchy in configured "delete-delay-ms" times. But if the container process be killed for seconds which large than "delete-delay-ms" time, the container hierarchy will remain for ever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4381: Attachment: YARN-4381.001.patch I attach a init patch and add two new metrics in {{NodeManagerMetrics}} * containerLocalizeFailed * containersLaunchEventOperation > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > MapappAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4165) An outstanding container request makes all nodes to be reserved causing all jobs pending
[ https://issues.apache.org/jira/browse/YARN-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-4165: -- Description: We have a long running service in YARN, it has an outstanding container request that YARN cannot satisfy (require more memory that nodemanager can supply). Then YARN reserves all nodes for this application, when I submit other jobs (require relative small memory that nodemanager can supply), all jobs are pending because YARN skips scheduling containers on the nodes that have been reserved. (was: We have a long running service in YARN, it has a outstanding container request that YARN cannot satisfy (require more memory that nodemanager can supply). Then YARN reserves all nodes for this application, when I submit other jobs (require relative small memory that nodemanager can supply), all jobs are pending because YARN skips scheduling containers on the nodes that have been reserved.) > An outstanding container request makes all nodes to be reserved causing all > jobs pending > > > Key: YARN-4165 > URL: https://issues.apache.org/jira/browse/YARN-4165 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > > We have a long running service in YARN, it has an outstanding container > request that YARN cannot satisfy (require more memory that nodemanager can > supply). Then YARN reserves all nodes for this application, when I submit > other jobs (require relative small memory that nodemanager can supply), all > jobs are pending because YARN skips scheduling containers on the nodes that > have been reserved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021397#comment-15021397 ] Inigo Goiri commented on YARN-3980: --- It looks like v9 solves the issues. Let me know if you see anything else. > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, > YARN-3980-v8.patch, YARN-3980-v9.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021243#comment-15021243 ] Hadoop QA commented on YARN-3980: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 7s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 35s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 57s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 57s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s {color} | {color:red} Patch generated 7 new checkstyle issues in root (total was 432, now 438). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 35s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 14s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 46s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s {color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 21s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. {color} | |
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021271#comment-15021271 ] Hadoop QA commented on YARN-3980: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 6s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 2s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 30s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 4s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 4s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s {color} | {color:red} Patch generated 7 new checkstyle issues in root (total was 433, now 439). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 10s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 13s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 45s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. {color} | |
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021364#comment-15021364 ] Hadoop QA commented on YARN-3980: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 6s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 19s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s {color} | {color:red} Patch generated 7 new checkstyle issues in root (total was 433, now 439). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 12s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 21s {color} | {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} | {color:red} hadoop-sls in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s {color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 12s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_85. {color} | |
[jira] [Created] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
Lin Yiqun created YARN-4381: --- Summary: Add container launchEvent and container localizeFailed metrics in container Key: YARN-4381 URL: https://issues.apache.org/jira/browse/YARN-4381 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.1 Reporter: Lin Yiqun Recently, I found a issue on nodemanager metrics.That's {{NodeManagerMetrics#containersLaunched}} is not actually means the container succeed launched times.Because in some time, it will be failed when receiving the killing command or happening container-localizationFailed.This will lead to a failed container.But now,this counter value will be increased in these code whenever the container is started successfully or failed. {code} Credentials credentials = parseCredentials(launchContext); Container container = new ContainerImpl(getConfig(), this.dispatcher, context.getNMStateStore(), launchContext, credentials, metrics, containerTokenIdentifier); ApplicationId applicationID = containerId.getApplicationAttemptId().getApplicationId(); if (context.getContainers().putIfAbsent(containerId, container) != null) { NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, "ContainerManagerImpl", "Container already running on this node!", applicationID, containerId); throw RPCUtil.getRemoteException("Container " + containerIdStr + " already is running on this node!!"); } this.readLock.lock(); try { if (!serviceStopped) { // Create the application Application application = new ApplicationImpl(dispatcher, user, applicationID, credentials, context); if (null == context.getApplications().putIfAbsent(applicationID, application)) { LOG.info("Creating a new application reference for app " + applicationID); LogAggregationContext logAggregationContext = containerTokenIdentifier.getLogAggregationContext(); MapappAcls = container.getLaunchContext().getApplicationACLs(); context.getNMStateStore().storeApplication(applicationID, buildAppProto(applicationID, user, credentials, appAcls, logAggregationContext)); dispatcher.getEventHandler().handle( new ApplicationInitEvent(applicationID, appAcls, logAggregationContext)); } this.context.getNMStateStore().storeContainer(containerId, request); dispatcher.getEventHandler().handle( new ApplicationContainerInitEvent(container)); this.context.getContainerTokenSecretManager().startContainerSuccessful( containerTokenIdentifier); NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, "ContainerManageImpl", applicationID, containerId); // TODO launchedContainer misplaced -> doesn't necessarily mean a container // launch. A finished Application will not launch containers. metrics.launchedContainer(); metrics.allocateContainer(containerTokenIdentifier.getResource()); } else { throw new YarnException( "Container start failed as the NodeManager is " + "in the process of shutting down"); } {code} In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021053#comment-15021053 ] Eric Payne commented on YARN-3769: -- [~leftnoteasy], Thank you very much! > Consider user limit when calculating total pending resource for preemption > policy in Capacity Scheduler > --- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 2.7.3 > > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, > YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, > YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4379) TestWebApp failing in trunk
[ https://issues.apache.org/jira/browse/YARN-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020948#comment-15020948 ] Steve Loughran commented on YARN-4379: -- I know its a simple patch, but I'm debating reverting HADOOP-12584 directly and require that patch to include this. Authors of patch have to get into the habit of being rigorous about testing things, and for changes into Hadoop core, that includes YARN and HDFS test runs. Web-wise, YARN and downstream apps are particularly fussy > TestWebApp failing in trunk > --- > > Key: YARN-4379 > URL: https://issues.apache.org/jira/browse/YARN-4379 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4379.01.patch > > > {noformat} > Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.917 sec <<< > FAILURE! - in org.apache.hadoop.yarn.webapp.TestWebApp > testYARNWebAppContext(org.apache.hadoop.yarn.webapp.TestWebApp) Time > elapsed: 1.828 sec <<< ERROR! > java.lang.RuntimeException: java.io.IOException: Server returned HTTP > response code: 403 for URL: http://localhost:57512/static/ > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1628) > at > org.apache.hadoop.yarn.webapp.TestWebApp.getContent(TestWebApp.java:293) > at > org.apache.hadoop.yarn.webapp.TestWebApp.testYARNWebAppContext(TestWebApp.java:276) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021151#comment-15021151 ] Hadoop QA commented on YARN-3456: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 31s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 36, now 37). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 3s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 5s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 18s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v8.patch Tackling Karthik's comments. > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: (was: YARN-3980-v8.patch) > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v8.patch Increasing timeout to 2 seconds again because unit tests fail. > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3456: --- Attachment: YARN-3456.02.patch > Improve handling of incomplete TimelineEntities > --- > > Key: YARN-3456 > URL: https://issues.apache.org/jira/browse/YARN-3456 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-3456.01.patch, YARN-3456.02.patch > > > If an incomplete TimelineEntity is posted, it isn't checked client side ... > it gets all the way to the far end before triggering an NPE in the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021118#comment-15021118 ] Varun Saxena commented on YARN-3456: Sorry Kuhu. Had missed your comment. I will have bandwidth to handle this JIRA. Its a straightforward fix. Will update a patch shortly. > Improve handling of incomplete TimelineEntities > --- > > Key: YARN-3456 > URL: https://issues.apache.org/jira/browse/YARN-3456 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > > If an incomplete TimelineEntity is posted, it isn't checked client side ... > it gets all the way to the far end before triggering an NPE in the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3456: --- Attachment: YARN-3456.01.patch > Improve handling of incomplete TimelineEntities > --- > > Key: YARN-3456 > URL: https://issues.apache.org/jira/browse/YARN-3456 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-3456.01.patch > > > If an incomplete TimelineEntity is posted, it isn't checked client side ... > it gets all the way to the far end before triggering an NPE in the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021128#comment-15021128 ] Varun Saxena commented on YARN-3456: Updated a patch. Added checks at both client and server side. > Improve handling of incomplete TimelineEntities > --- > > Key: YARN-3456 > URL: https://issues.apache.org/jira/browse/YARN-3456 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-3456.01.patch > > > If an incomplete TimelineEntity is posted, it isn't checked client side ... > it gets all the way to the far end before triggering an NPE in the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4298) Fix findbugs warnings in hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021136#comment-15021136 ] Varun Saxena commented on YARN-4298: bq. please let me know if this is handled by you in any other tickets. Thanks for taking this up [~sunilg]. No, I havent handled it anywhere else. > Fix findbugs warnings in hadoop-yarn-common > --- > > Key: YARN-4298 > URL: https://issues.apache.org/jira/browse/YARN-4298 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Sunil G >Priority: Minor > Attachments: 0001-YARN-4298.patch, 0002-YARN-4298.patch > > > {noformat} > classname='org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl'> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder; > locked 95% of time' lineNumber='390'/> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto; > locked 94% of time' lineNumber='390'/> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto; > locked 94% of time' lineNumber='390'/> > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4380: - Summary: TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8 (was: TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails on branch-2.8) > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently on branch-2.8 > -- > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Tsuyoshi Ozawa > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1438) When a container fails, the text of the exception isn't included in the diagnostics
[ https://issues.apache.org/jira/browse/YARN-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-1438: - Fix Version/s: 2.3.0 > When a container fails, the text of the exception isn't included in the > diagnostics > --- > > Key: YARN-1438 > URL: https://issues.apache.org/jira/browse/YARN-1438 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.3.0 > > Attachments: YARN-1438-001.patch > > > The diagnostics text generated when a container execution thrown an exception > doesn't include the exception message -only the stack trace. This makes > debugging harder than necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4379) TestWebApp failing in trunk
[ https://issues.apache.org/jira/browse/YARN-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021086#comment-15021086 ] Varun Saxena commented on YARN-4379: Ok...If HADOOP-12584 is reverted, it should be handled there. Probably QA reports should run YARN, MR and HDFS tests too while building common. But that can increase build times. > TestWebApp failing in trunk > --- > > Key: YARN-4379 > URL: https://issues.apache.org/jira/browse/YARN-4379 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4379.01.patch > > > {noformat} > Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.917 sec <<< > FAILURE! - in org.apache.hadoop.yarn.webapp.TestWebApp > testYARNWebAppContext(org.apache.hadoop.yarn.webapp.TestWebApp) Time > elapsed: 1.828 sec <<< ERROR! > java.lang.RuntimeException: java.io.IOException: Server returned HTTP > response code: 403 for URL: http://localhost:57512/static/ > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1628) > at > org.apache.hadoop.yarn.webapp.TestWebApp.getContent(TestWebApp.java:293) > at > org.apache.hadoop.yarn.webapp.TestWebApp.testYARNWebAppContext(TestWebApp.java:276) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.
[ https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-2444. -- Resolution: Won't Fix closing at a wontfix; its moot in ATS2 and the SPARK-1537 code does its own REST client > Primary filters added after first submission not indexed, cause exceptions in > logs. > --- > > Key: YARN-2444 > URL: https://issues.apache.org/jira/browse/YARN-2444 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.5.0 >Reporter: Marcelo Vanzin >Assignee: Steve Loughran > Attachments: YARN-2444-001.patch, ats.java, > org.apache.hadoop.yarn.server.timeline.TestTimelineClientPut-output.txt > > > See attached code for an example. The code creates an entity with a primary > filter, submits it to the ATS. After that, a new primary filter value is > added and the entity is resubmitted. At that point two things can be seen: > - Searching for the new primary filter value does not return the entity > - The following exception shows up in the logs: > {noformat} > 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying > access for user dr.who (auth:SIMPLE) on the events of the timeline entity { > id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test } > org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the > timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test > } is corrupted. > at > org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67) > at > org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails on branch-2.8
Tsuyoshi Ozawa created YARN-4380: Summary: TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails on branch-2.8 Key: YARN-4380 URL: https://issues.apache.org/jira/browse/YARN-4380 Project: Hadoop YARN Issue Type: Bug Components: test Affects Versions: 2.8.0 Reporter: Tsuyoshi Ozawa {quote} Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) Time elapsed: 0.109 sec <<< FAILURE! org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: Argument(s) are different! Wanted: deletionService.delete( "user0", null, ); -> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) Actual invocation has different arguments: deletionService.delete( "user0", /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 ); -> at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities
[ https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021188#comment-15021188 ] Hadoop QA commented on YARN-3456: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 33s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 22s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 20s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 35s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 11s {color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 3s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.logaggregation.TestAggregatedLogDeletionService | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL
[jira] [Comment Edited] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021189#comment-15021189 ] Karthik Kambatla edited comment on YARN-3980 at 11/22/15 9:11 PM: -- Jenkins could be running on a (virtual) machine much slower than dev machines. Could we sleep for longer to ensure it doesn't fail on Jenkins. How about the following: {code} for (int i = 0; i < 100; i++) { Thread.sleep(100); // check if heartbeat propagated } {code} was (Author: kasha): Jenkins could be running on a (virtual) machine much slower than dev machines. Could we sleep for longer to ensure it doesn't fail on Jenkins. How about the following: {code} for (int i = 0; i < 100; i++) { Thread.sleep(100); // check if heartbeat propagated } > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021189#comment-15021189 ] Karthik Kambatla commented on YARN-3980: Jenkins could be running on a (virtual) machine much slower than dev machines. Could we sleep for longer to ensure it doesn't fail on Jenkins. How about the following: {code} for (int i = 0; i < 100; i++) { Thread.sleep(100); // check if heartbeat propagated } > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021201#comment-15021201 ] Inigo Goiri commented on YARN-3980: --- Any easy place to check in the RM for the heartbeats? > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3980: -- Attachment: YARN-3980-v9.patch Waiting for heartbeat to propagate based on the expected value. This apporoahc is simlar to what is used in TestDiskFailures. > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, > YARN-3980-v8.patch, YARN-3980-v9.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)