[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-08 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.002.patch

Thanks [~djp] for review. I update the container metrics 
more fine-grained. As you said that the container failed is not only because 
localizationFailed and is not suitable to add the metric on launchEvent. So I 
add the metric {{containerLaunchedSuccess}} when container is becoming to 
running state and seting the {{wasLaunched=true}}. Besides this, I add the 
another two metric2 for container-failed cases.
* one is for containerFailedBeforeLaunched
* other one is for containerKilledAfterLaunched
And I think these metrics will help us to know more concretely of a container.

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-23 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4381:
-
Assignee: Lin Yiqun

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-22 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.001.patch

I attach a init patch and add two new metrics in {{NodeManagerMetrics}}
* containerLocalizeFailed
* containersLaunchEventOperation


> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)