[jira] [Commented] (YARN-9002) YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS and local filesystem only

2018-11-10 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682434#comment-16682434
 ] 

Gour Saha commented on YARN-9002:
-

Thanks a lot [~eyang]

> YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS 
> and local filesystem only
> -
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9002-branch-3.1.001.patch, 
> YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9002) YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS and local filesystem only

2018-11-10 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-9002:

Summary: YARN Service keytab does not support s3, wasb, gs and is 
restricted to HDFS and local filesystem only  (was: YARN Service keytab 
location is restricted to HDFS and local filesystem only)

> YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS 
> and local filesystem only
> -
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-9002-branch-3.1.001.patch, 
> YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682205#comment-16682205
 ] 

Gour Saha commented on YARN-9002:
-

Thanks [~eyang] for reviewing. I uploaded 002 patch with imports removed.

Uploaded the trunk patch also.

> YARN Service keytab location is restricted to HDFS and local filesystem only
> 
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-9002-branch-3.1.001.patch, 
> YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-9002:

Attachment: YARN-9002.001.patch

> YARN Service keytab location is restricted to HDFS and local filesystem only
> 
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-9002-branch-3.1.001.patch, 
> YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-9002:

Attachment: YARN-9002-branch-3.1.002.patch

> YARN Service keytab location is restricted to HDFS and local filesystem only
> 
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-9002-branch-3.1.001.patch, 
> YARN-9002-branch-3.1.002.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682166#comment-16682166
 ] 

Gour Saha commented on YARN-9002:
-

Tested this patch in tandem with the patch in HIVE-20899 on a cluster based on 
branch-3.1 and it passed for wasb.

/cc [~eyang], when you get a chance please review the patch for branch-3.1. The 
trunk code has diverged a bit so I am preparing a separate patch for trunk.

> YARN Service keytab location is restricted to HDFS and local filesystem only
> 
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-9002-branch-3.1.001.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-9002:

Attachment: YARN-9002-branch-3.1.001.patch

> YARN Service keytab location is restricted to HDFS and local filesystem only
> 
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-9002-branch-3.1.001.patch
>
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)
Gour Saha created YARN-9002:
---

 Summary: YARN Service keytab location is restricted to HDFS and 
local filesystem only
 Key: YARN-9002
 URL: https://issues.apache.org/jira/browse/YARN-9002
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.1
Reporter: Gour Saha


ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
file. This restricts it from supporting other FileSystem API conforming FSs 
like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only

2018-11-09 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-9002:
---

Assignee: Gour Saha

> YARN Service keytab location is restricted to HDFS and local filesystem only
> 
>
> Key: YARN-9002
> URL: https://issues.apache.org/jira/browse/YARN-9002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Major
>
> ServiceClient.java specifically checks if the keytab URI scheme is hdfs or 
> file. This restricts it from supporting other FileSystem API conforming FSs 
> like s3a, wasb, gs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8682) YARN Service throws NPE when explicit null instead of empty object {} is used

2018-10-24 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662524#comment-16662524
 ] 

Gour Saha commented on YARN-8682:
-

[~csingh] it was encountered by the hive team long time back so I don't have 
the stack. Seems like recent code changes are handling this case. In that case 
feel free to close this bug as cannot reproduce.

> YARN Service throws NPE when explicit null instead of empty object {} is used
> -
>
> Key: YARN-8682
> URL: https://issues.apache.org/jira/browse/YARN-8682
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.0.1
>Reporter: Gour Saha
>Assignee: Chandni Singh
>Priority: Major
>
> YARN Service should not throw NPE for a config like this -
> {code}
> .
> .
> "configuration": {
> "env": {
> "HADOOP_CONF_DIR": "/hadoop-conf",
> "USER": "testuser",
> "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS": 
> "/sys/fs/cgroup:/sys/fs/cgroup:ro",
> "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
> },
> "files": null
> }
> .
> .
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user

2018-09-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629168#comment-16629168
 ] 

Gour Saha commented on YARN-8734:
-

bq. Yes, component dependencies is also outside of component properties in the 
component section. I think this is aligned correctly.

[~eyang], am I missing something here? Please see where "dependencies" is 
defined in Component_dependencies.png vs where it is defined in 
Service_dependencies.png (attached).

> Readiness check for remote service belongs to the same user
> ---
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Component_dependencies.png, Dependency check vs.pdf, 
> Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, 
> YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8734) Readiness check for remote service belongs to the same user

2018-09-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8734:

Attachment: Service_dependencies.png
Component_dependencies.png

> Readiness check for remote service belongs to the same user
> ---
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Component_dependencies.png, Dependency check vs.pdf, 
> Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, 
> YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service

2018-09-21 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624346#comment-16624346
 ] 

Gour Saha commented on YARN-8734:
-

Yup, it would be great to get [~billie.rinaldi] thoughts on the naming as well.

Actually, all properties including dependencies should be under the properties 
section. That's how it is for Component also. Please re-check. I hope I am not 
missing something.

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service

2018-09-21 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624319#comment-16624319
 ] 

Gour Saha commented on YARN-8734:
-

In that case may be a simpler approach will be to call this property 
"dependencies". It is already at the service level so it implies service level 
dependencies. Just like dependencies at the component level implies component 
dependencies and is simply called "dependencies". Additionally, avoiding the 
remote or external keywords helps avoid confusion or limitations in service 
owner's mind. Just like component "dependencies" validate that the values are 
valid component names, expectation would be that service level "dependencies" 
will be valid YARN services only. At least that's exactly what the code does.

One code review comment:

Is {{remote_service_dependencies}} defined outside the properties section in 
YAML swagger spec?

 

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service

2018-09-21 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624250#comment-16624250
 ] 

Gour Saha commented on YARN-8734:
-

[~eyang] this is a pretty useful feature so thanks for taking this up. Although 
I did not get a chance to test the patch it overall looks okay.

But one question: from a naming perspective, the opposite of remote is local. 
What does local service mean? Are we excluding local services? To me, it seems 
like we wanted to mean external services instead of remote services. Thoughts?

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf, YARN-8734.001.patch, 
> YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, 
> YARN-8734.005.patch
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code

2018-09-14 Thread Gour Saha (JIRA)
Gour Saha created YARN-8779:
---

 Summary: Fix few discrepancies between YARN Service swagger spec 
and code
 Key: YARN-8779
 URL: https://issues.apache.org/jira/browse/YARN-8779
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.1.1, 3.1.0
Reporter: Gour Saha


Following issues were identified in YARN Service swagger definition during an 
effort to integrate with a running service by generating Java and Go 
client-side stubs from the spec -
 
1.
*restartPolicy* is wrong and should be *restart_policy*
 
2.
A DELETE request to a non-existing service (or a previously existing but 
deleted service) throws an ApiException instead of something like 
NotFoundException (the equivalent of 404). Note, DELETE of an existing service 
behaves fine.
 
3.
The response code of DELETE request is 200. The spec says 204. Since the 
response has a payload, the spec should be updated to 200 instead of 204.
 
4.
 _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method 
does not return a Service object. Swagger definition has the below bug in GET 
response of */app/v1/services/\{service_name}* -
{code:java}
type: object
items:
  $ref: '#/definitions/Service'
{code}
It should be -
{code:java}
$ref: '#/definitions/Service'
{code}
 
5.
Serialization issues were seen in all enum classes - ServiceState.java, 
ContainerState.java, ComponentState.java, PlacementType.java and 
PlacementScope.java.

Java client threw the below exception for ServiceState -
{code:java}
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot 
construct instance of `org.apache.cb.yarn.service.api.records.ServiceState` 
(although at least one Creator exists): no String-argument constructor/factory 
method to deserialize from String value ('ACCEPTED')
 at [Source: 
(org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
 line: 1, column: 121] (through reference chain: 
org.apache.cb.yarn.service.api.records.Service["state”])
{code}
For Golang we saw this for ContainerState -
{code:java}
ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot unmarshal 
string into Go struct field Container.state of type yarnmodel.ContainerState 
{code}
 
6.
*launch_time* actually returns an integer but swagger definition says date. 
Hence, the following exception is seen on the client side -
{code:java}
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: 
Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or 
string.
 at [Source: 
(org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream);
 line: 1, column: 477] (through reference chain: 
org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”])
{code}
 
8.
*user.name* query param with a valid value is required for all API calls to an 
unsecure cluster. This is not defined in the spec.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8682) YARN Service throws NPE when explicit null instead of empty object {} is used

2018-08-18 Thread Gour Saha (JIRA)
Gour Saha created YARN-8682:
---

 Summary: YARN Service throws NPE when explicit null instead of 
empty object {} is used
 Key: YARN-8682
 URL: https://issues.apache.org/jira/browse/YARN-8682
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn-native-services
Affects Versions: 3.0.1
Reporter: Gour Saha


YARN Service should not throw NPE for a config like this -
{code}
.
.
"configuration": {
"env": {
"HADOOP_CONF_DIR": "/hadoop-conf",
"USER": "testuser",
"YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS": 
"/sys/fs/cgroup:/sys/fs/cgroup:ro",
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
},
"files": null
}
.
.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5738) Allow services to release/kill specific containers

2018-08-17 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-5738:

Target Version/s: 3.0.3
 Component/s: yarn-native-services

> Allow services to release/kill specific containers
> --
>
> Key: YARN-5738
> URL: https://issues.apache.org/jira/browse/YARN-5738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Siddharth Seth
>Priority: Major
>
> There are occasions on which specific containers may not be required by a 
> service. Would be useful to have support to return these to YARN.
> Slider flex doesn't give this control.
> cc [~gsaha], [~vinodkv]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8136) Add version attribute to site doc examples and quickstart

2018-08-03 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568588#comment-16568588
 ] 

Gour Saha commented on YARN-8136:
-

+1. Patch looks good to me.

> Add version attribute to site doc examples and quickstart
> -
>
> Key: YARN-8136
> URL: https://issues.apache.org/jira/browse/YARN-8136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: site
>Reporter: Gour Saha
>Priority: Major
> Attachments: YARN-8136.001.patch
>
>
> version attribute is missing in the following 2 site doc files -
> src/site/markdown/yarn-service/Examples.md
> src/site/markdown/yarn-service/QuickStart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification

2018-07-31 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564275#comment-16564275
 ] 

Gour Saha commented on YARN-8392:
-

Thanks [~billie.rinaldi]. Patch 4 looks good. The documentation in swagger 
definition (YARN-Simplified-V1-API-Layer-For-Services.yaml), examples 
(YARN-Services-Examples.md) and site documentation are quite generic since it 
talks about the broader placement policy support. However, do you want to 
review them once and see if we should add some specific examples for this 
symmetric usecase.

> Allow multiple tags for anti-affinity placement policy in service 
> specification
> ---
>
> Key: YARN-8392
> URL: https://issues.apache.org/jira/browse/YARN-8392
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch, 
> YARN-8392.4.patch
>
>
> Currently the service client code is restricting a component's target tags to 
> include only a single tag, the component name. I have a use case for two 
> components having anti-affinity with themselves and with each other. The YARN 
> placement policies support this, but the service framework isn't allowing it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-31 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564250#comment-16564250
 ] 

Gour Saha commented on YARN-8579:
-

Thanks [~csingh]. [~eyang] please review and commit when you get a chance.

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch, YARN-8579.004.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-30 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Attachment: YARN-8579.004.patch

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch, YARN-8579.004.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification

2018-07-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562621#comment-16562621
 ] 

Gour Saha commented on YARN-8392:
-

[~billie.rinaldi] thank you for the patch. This symmetric scenario will be a 
good one to open up for services. The patch looks good to me. +1 for it.

Just one comment on the error message -
{code}
  String ERROR_PLACEMENT_POLICY_TAG_INVALID = "Invalid target tag %s "
  + "specified in placement policy of component %s. Component %s must "
  + "also appear in placement policy of component %s with the same "
  + "constraint type.";
{code}
Since we are checking for scope in addition to constraint type, should we 
explicitly mention that in the error message too?

> Allow multiple tags for anti-affinity placement policy in service 
> specification
> ---
>
> Key: YARN-8392
> URL: https://issues.apache.org/jira/browse/YARN-8392
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch
>
>
> Currently the service client code is restricting a component's target tags to 
> include only a single tag, the component name. I have a use case for two 
> components having anti-affinity with themselves and with each other. The YARN 
> placement policies support this, but the service framework isn't allowing it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562586#comment-16562586
 ] 

Gour Saha commented on YARN-8579:
-

Ah, nice catch [~csingh]. That's exactly what the issue was. With the fix in 
FairScheduler.java, the test now passes for both FAIR and CAPACITY schedulers. 
I am running all the tests now and will upload the updated patch after they all 
pass.

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-28 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560862#comment-16560862
 ] 

Gour Saha commented on YARN-8579:
-

None of the test failures are related to the code change and all patches have 
completely different non-overlapping test failures.

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-27 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560421#comment-16560421
 ] 

Gour Saha commented on YARN-8522:
-

[~Zian Chen] 002 patch looks ok. I don't have a good setup to test this. Were 
you able to reproduce this issue in a cluster without your patch and then test 
that your patch fixes it? Do you think we can write a test for it?

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560415#comment-16560415
 ] 

Gour Saha commented on YARN-8579:
-

Thanks [~csingh] for the review. I uploaded 003 with your suggestion.

I do have one fundamental question though. I don't understand why for FAIR 
scheduler the below assert fails (which means no NMTokens are sent over even 
with this patch). The method where I made the code change is a common method 
which is called by both Fair and Capacity Schedulers. Any idea? That's why I 
had to enable this assert for CAPACITY scheduler only. I don't have a cluster 
setup where I can test FairScheduler.
{code}
  if (getSchedulerType().equals(SchedulerType.CAPACITY)) {
Assert.assertEquals(1, nmTokens.size());
// container 3 is running on node 2
Assert.assertEquals(nm2Address,
nmTokens.get(0).getNodeId().toString());
  }
{code}

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Attachment: YARN-8579.003.patch

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560197#comment-16560197
 ] 

Gour Saha commented on YARN-8579:
-

[~csingh], please review the patch when you get a chance.

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560193#comment-16560193
 ] 

Gour Saha commented on YARN-8579:
-

Uploaded 002 with a few more asserts in the test.

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Attachment: YARN-8579.002.patch

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-27 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560105#comment-16560105
 ] 

Gour Saha commented on YARN-8429:
-

Awesome. Thanks again [~eyang].

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Fix Version/s: 3.1.2
   3.2.0

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559134#comment-16559134
 ] 

Gour Saha commented on YARN-8429:
-

Thanks [~eyang] for the commit. Can you please commit it to branch-3.1 also 
since it is targetted for 3.1.2 release also?

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8579:

Attachment: YARN-8579.001.patch

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8579.001.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559122#comment-16559122
 ] 

Gour Saha commented on YARN-8579:
-

Uploading patch 001 with a fix that I successfully tested in my cluster

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8579.001.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559121#comment-16559121
 ] 

Gour Saha commented on YARN-8579:
-

I investigated this issue and figured that the root cause is the missing NM 
tokens corresponding to the containers which were passed to the AM after 
registration via the onContainersReceivedFromPreviousAttempts callback. This is 
required with the change made in YARN-6168. Exception seen in AM log is as 
below -

{code}
2018-07-26 23:22:31,373 [pool-5-thread-4] ERROR instance.ComponentInstance - 
[COMPINSTANCE httpd-proxy-0 : container_e15_1532637883791_0001_01_04] 
Failed to get container status on 
ctr-e138-1518143905142-412155-01-05.hwx.site:25454, will try again
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for ctr-e138-1518143905142-412155-01-05.hwx.site:25454
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:252)
at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137)
at 
org.apache.hadoop.yarn.client.api.impl.NMClientImpl.getContainerStatus(NMClientImpl.java:323)
at 
org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStatusRetriever.run(ComponentInstance.java:596)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsub

[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558931#comment-16558931
 ] 

Gour Saha commented on YARN-8429:
-

Mistakenly had a test commented out in patch 003. Undoing that in patch 004. 
Thanks [~billie.rinaldi] for catching that.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-26 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Attachment: YARN-8429.004.patch

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch, YARN-8429.004.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8580) yarn.resourcemanager.am.max-attempts is not respected for yarn services

2018-07-25 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556935#comment-16556935
 ] 

Gour Saha commented on YARN-8580:
-

Actually, this is Yarn Service specific property. So the value 20 is getting 
set because that's the default for Yarn Services. The reason 100 was not taking 
effect is - for Yarn Service the property name is 
yarn.service.am-restart.max-attempts and not 
yarn.resourcemanager.am.max-attempts.

Once the right property is set, the desired behavior will be seen.

It is still an Invalid jira though.

> yarn.resourcemanager.am.max-attempts is not respected for yarn services
> ---
>
> Key: YARN-8580
> URL: https://issues.apache.org/jira/browse/YARN-8580
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Priority: Major
>
> 1) Max am attempt is set to 100 on all nodes. ( including gateway)
> {code}
>  
>   yarn.resourcemanager.am.max-attempts
>   100
> {code}
> 2) Start a Yarn service ( Hbase tarball ) application
> 3) Kill AM 20 times
> Here, App fails with below diagnostics.
> {code}
> bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status 
> application_1532481557746_0001
> 18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> 18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml 
> at file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml
> Application Report : 
>   Application-Id : application_1532481557746_0001
>   Application-Name : hbase-tarball-lr
>   Application-Type : yarn-service
>   User : hbase
>   Queue : default
>   Application Priority : 0
>   Start-Time : 1532481864863
>   Finish-Time : 1532522943103
>   Progress : 100%
>   State : FAILED
>   Final-State : FAILED
>   Tracking-URL : 
> https://xxx:8090/cluster/app/application_1532481557746_0001
>   RPC Port : -1
>   AM Host : N/A
>   Aggregate Resource Allocation : 252150112 MB-seconds, 164141 
> vcore-seconds
>   Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
>   Log Aggregation Status : SUCCEEDED
>   Diagnostics : Application application_1532481557746_0001 failed 20 
> times (global limit =100; local limit is =20) due to AM Container for 
> appattempt_1532481557746_0001_20 exited with  exitCode: 137
> Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed 
> on request. Exit code is 137
> [2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. 
> [2018-07-25 12:49:03.045]Killed by external signal
> For more detailed output, check the application tracking page: 
> https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on 
> links to logs of each attempt.
> . Failing the application.
>   Unmanaged Application : false
>   Application Node Label Expression : 
>   AM container Node Label Expression : 
>   TimeoutType : LIFETIME  ExpiryTime : 2018-07-25T22:26:15.419+   
> RemainingTime : 0seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-25 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-8579:
---

Assignee: Gour Saha

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-25 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556267#comment-16556267
 ] 

Gour Saha commented on YARN-8545:
-

[~csingh] patch 001 looks good to me. +1.

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-21 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551618#comment-16551618
 ] 

Gour Saha commented on YARN-8429:
-

Patch 003 has the absolute->relative suggested change.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-21 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Attachment: YARN-8429.003.patch

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch, 
> YARN-8429.003.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Attachment: YARN-8429.002.patch

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551276#comment-16551276
 ] 

Gour Saha commented on YARN-8429:
-

Uploaded 002 with a test fix.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch, YARN-8429.002.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Fix Version/s: (was: 3.1.1)
   3.1.2

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8429.001.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551208#comment-16551208
 ] 

Gour Saha commented on YARN-8429:
-

[~eyang] please review when you get a chance. It's a usability improvement 
patch.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8429.001.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551204#comment-16551204
 ] 

Gour Saha commented on YARN-8429:
-

[~yeshavora], with this patch the error msg will change to -
{code}
For component httpd with no artifact, dest_file must not be an absolute path: 
/xxx/xxx
{code}

This should help identify easily that component httpd is being identified as a 
component with no artifact and help detect this typo faster.

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8429.001.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Fix Version/s: 3.1.1
   3.2.0

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8429.001.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-20 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8429:

Attachment: YARN-8429.001.patch

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8429.001.patch
>
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8429) Improve diagnostic message when artifact is not set properly

2018-07-18 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-8429:
---

Assignee: Gour Saha

> Improve diagnostic message when artifact is not set properly
> 
>
> Key: YARN-8429
> URL: https://issues.apache.org/jira/browse/YARN-8429
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
>
> Steps:
> 1) Create launch json file. Replace "artifact" with "artifacts"
> 2) launch yarn service app with cli
> The application launch fails with below error
> {code}
> [xxx xxx]$ yarn app -launch test2-2 test.json 
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition 
> from local FS: /xxx/test.json
> 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms
> 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be 
> absolute path: /xxx/xxx
> {code}
> artifact field is not mandatory. However, If that field is specified 
> incorrectly, launch cmd should fail with proper error. 
> Here, The error message regarding Dest file is misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548417#comment-16548417
 ] 

Gour Saha commented on YARN-8301:
-

Great. Patch 4 looks good. Not sure why I see the trailing whitespaces when I 
apply the patch. The jenkins build should tell us. +1 for 004 pending jenkins.

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548375#comment-16548375
 ] 

Gour Saha commented on YARN-8542:
-

[~csingh] agreed that the API is to request for containers. However, the 
structure I proposed adheres to the current status API structure and the 
swagger definition. Note, service owners are already parsing through the 
component instances across multiple components in the status response payload 
if they need a single collection of all component instances. If you add a new 
attribute "component_name" now, you would need to modify the swagger definition 
and it would actually mean a change for the end-users since they would have to 
handle the containers API output differently from the status API output.

Let me know what you think.

> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548360#comment-16548360
 ] 

Gour Saha commented on YARN-8301:
-

[~csingh], patch 2 looks good. Let's add to the top of this doc - "Experimental 
Feature - Tech Preview" and create a reference to it from Overview.md (and also 
mention it there that it is an Experimental Feature - Tech Preview). Thanks 
[~eyang] for pointing this out.

Few minor comments -
1. In line 148 do we need the line "name": "sleeper-service" in the JSON spec 
for version 1.0.1 of the service.
2. Remove the trailing whitespaces from all the lines

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-16 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545719#comment-16545719
 ] 

Gour Saha commented on YARN-8299:
-

Just thinking aloud. Would returning the data in this format make more sense? 
This would maintain consistency with the status command output as well -

{code}
[
  {
"name": "ping",
"containers": [
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-0",
"hostname": "ping-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_02",
"ip": "172.26.111.21",
"launch_time": 1531767377301,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-1",
"hostname": "ping-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_07",
"ip": "172.26.111.21",
"launch_time": 1531767410395,
"state": "RUNNING_BUT_UNREADY"
  }
]
  },
  {
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "sleep-1",
"hostname": "sleep-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_05",
"ip": "172.26.111.21",
"launch_time": 1531767378303,
"state": "READY"
  }
]
  }
]
{code}

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch, YARN-8299.004.patch, YARN-8299.005.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-16 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545711#comment-16545711
 ] 

Gour Saha commented on YARN-8301:
-

Thanks [~csingh] for the doc patch. Overall looks good. Few comments -
1.
Change {{path_to_service_def_file}} to {{path_to_new_service_def_file}}

2.

Can you add a sample response of a {{status}} output for the version 1.0.0 of 
the sleeper service and paste it just above the "Initiate Upgrade" section? 
This will put a lot of subsequent references in context, like your service is 
named "my-sleeper" and what sleeper-0 and sleeper-1 are when you refer them in 
"Upgrade Instance" section.

3.

Can you add an "Upgrade Component" example right after "Upgrade Instance"?

4.

In the "Finalize Upgrade" section can you change it to -
{code:java}
User must finalize the upgrade using the below command (since autoFinalize was 
not specified during initiate):{code}

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543828#comment-16543828
 ] 

Gour Saha commented on YARN-8299:
-

One more minor fix in the below comment in ApplicationCLI.java -
{code}
  // not appAttemptIf format, it could be appName.
{code}
change appAttemptIf to appAttemptId.

+1 for 004 patch post jenkins.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch, YARN-8299.004.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543756#comment-16543756
 ] 

Gour Saha commented on YARN-8299:
-

testFilterWithState fails locally in my env as well.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543694#comment-16543694
 ] 

Gour Saha edited comment on YARN-8299 at 7/13/18 9:00 PM:
--

In ApplicationCLI.java can we change -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " +
  "Application Name");
{code}
to -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID");
{code}

This will keep it in-line with "yarn app" descriptions like "yarn app -status 
".


was (Author: gsaha):
In ApplicationCLI.java can we change -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " +
  "Application Name");
{code}
to -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID";
{code}

This will keep it in-line with "yarn app" descriptions like "yarn app -status 
".

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543694#comment-16543694
 ] 

Gour Saha edited comment on YARN-8299 at 7/13/18 8:56 PM:
--

In ApplicationCLI.java can we change -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " +
  "Application Name");
{code}
to -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID";
{code}

This will keep it in-line with "yarn app" descriptions like "yarn app -status 
".


was (Author: gsaha):
In ApplicationCLI.java can we changed -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " +
  "Application Name");
{code}
to -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID>";
{code}

This will keep it in-line with "yarn app" descriptions like "yarn app -status 
".

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543694#comment-16543694
 ] 

Gour Saha commented on YARN-8299:
-

In ApplicationCLI.java can we changed -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " +
  "Application Name");
{code}
to -
{code}
  opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID>";
{code}

This will keep it in-line with "yarn app" descriptions like "yarn app -status 
".

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543680#comment-16543680
 ] 

Gour Saha commented on YARN-8299:
-

Ah, my bad. The patch already supports -components.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543676#comment-16543676
 ] 

Gour Saha commented on YARN-8299:
-

We need -components support, whether we support -states in tandem with it or 
not. However, the 2 options together make sense too since I might be interested 
to list all containers in READY state across all components in a single API 
call.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543663#comment-16543663
 ] 

Gour Saha edited comment on YARN-8299 at 7/13/18 8:28 PM:
--

[~csingh] shouldn't we add a filter "-components compNameA[,compNameB,...]" to 
filter the container list further for specific components?


was (Author: gsaha):
[~csingh] shouldn't we add a filter "-components compNameA[,compNameB,...]" to 
filter the container list further for specific components.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543663#comment-16543663
 ] 

Gour Saha commented on YARN-8299:
-

[~csingh] shouldn't we add a filter "-components compNameA[,compNameB,...]" to 
filter the container list further for specific components.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8360) Yarn service conflict between restart policy and NM configuration

2018-07-10 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539405#comment-16539405
 ] 

Gour Saha commented on YARN-8360:
-

Thanks [~suma.shivaprasad], patch 1 looks good to me. +1.

> Yarn service conflict between restart policy and NM configuration 
> --
>
> Key: YARN-8360
> URL: https://issues.apache.org/jira/browse/YARN-8360
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Chandni Singh
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8360.1.patch
>
>
> For the below spec, the service will not stop even after container failures 
> because of the NM auto retry properties :
>  * "yarn.service.container-failure.retry.max": 1,
>  * "yarn.service.container-failure.validity-interval-ms": 5000
>  The NM will continue auto-restarting containers.
>  {{fail_after 20}} fails after 20 seconds. Since the validity failure 
> interval is 5 seconds, NM will auto restart the container.
> {code:java}
> {
>   "name": "fail-demo2",
>   "version": "1.0.0",
>   "components" :
>   [
> {
>   "name": "comp1",
>   "number_of_containers": 1,
>   "launch_command": "fail_after 20",
>   "restart_policy": "NEVER",
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "configuration": {
> "properties": {
>   "yarn.service.container-failure.retry.max": 1,
>   "yarn.service.container-failure.validity-interval-ms": 5000
> }
>   }
> }
>   ]
> }
> {code}
> If {{restart_policy}} is NEVER, then the service should stop after the 
> container fails.
> Since we have introduced, the service level Restart Policies, I think we 
> should make the NM auto retry configurations part of the {{RetryPolicy}} and 
> get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it 
> gets confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently

2018-07-02 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530428#comment-16530428
 ] 

Gour Saha commented on YARN-8485:
-

bq. This would ensure we don't accidentally call a rogue sudo command
I actually agree to this, since a rogue user could add any rogue sudo script to 
the PATH and pass this check. +1 to the get_docker_binary style OR explicitly 
checking both /bin/sudo and /usr/bin/sudo to keep the patch simple for now. We 
should fail if both the paths fail.

> Priviledged container app launch is failing intermittently
> --
>
> Key: YARN-8485
> URL: https://issues.apache.org/jira/browse/YARN-8485
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
> Environment: Debian
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8485.001.patch, YARN-8485.002.patch
>
>
> Privileged application fails intermittently 
> {code:java}
> yarn  jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar
>   -shell_command "sleep 30" -num_containers 1 -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar 
> /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code}
> Here,  container launch fails with 'Privileged containers are disabled' even 
> though Docker privilege container is enabled in the cluster
> {code:java|title=nm log}
> 2018-06-28 21:21:15,647 INFO  runtime.DockerLinuxContainerRuntime 
> (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - 
> All checks pass. Launching privileged container for : 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,665 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
> container_e01_1530220647587_0001_01_02 is : 29
> 2018-06-28 21:21:15,666 WARN  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
> container-launch with container ID: 
> container_e01_1530220647587_0001_01_02 and exit code: 29
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
>  Launch container failed
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception from container-launch.
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Container id: 
> container_e01_1530220647587_0001_01_02
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exit code: 29
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container 
> failed
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Shell error output: check 
> privileges failed for user: hrt_qa, error code: 0
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled 
> for user: hrt_qa
> 2018-06-28 21:21:15,668 INFO  nodemanager.ContainerExecutor 
> (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, 
> d

[jira] [Commented] (YARN-8445) YARN native service doesn't allow service name equals to component name

2018-06-29 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528382#comment-16528382
 ] 

Gour Saha commented on YARN-8445:
-

We might have to revisit this. It seems to be an issue in how we publish 
entities to ATSv2. We shouldn't have blocked a component name same as service 
name in validation. The example service sleeper itself will not run if a user 
tries to do "yarn app -launch sleeper sleeper".

> YARN native service doesn't allow service name equals to component name
> ---
>
> Key: YARN-8445
> URL: https://issues.apache.org/jira/browse/YARN-8445
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-8445.001.patch
>
>
> Now YARN service doesn't allow specifying service name equals to component 
> name.
> And it causes AM launch fails with msg like:
> {code} 
> org.apache.hadoop.metrics2.MetricsException: Metrics source tf-zeppelin 
> already exists!
>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>  at 
> org.apache.hadoop.yarn.service.ServiceMetrics.register(ServiceMetrics.java:75)
>  at 
> org.apache.hadoop.yarn.service.component.Component.(Component.java:193)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.createAllComponents(ServiceScheduler.java:552)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.buildInstance(ServiceScheduler.java:251)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceInit(ServiceScheduler.java:283)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>  at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:142)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:338)
> 2018-06-18 06:50:39,473 [main] INFO service.ServiceScheduler - Stopping 
> service scheduler
> {code}
> It's better to add this check in validation phase instead of failing AM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits

2018-06-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512014#comment-16512014
 ] 

Gour Saha commented on YARN-8425:
-

If you are not reporting a bug or improvement, sending email to 
u...@hadoop.apache.org is the right way to get your doubts and questions 
answered. Knowing your application's needs and asking for containers of the 
right size is the way to go. Disabling pmem check is not recommended in prod 
clusters.

> Yarn container getting killed due to running beyond physical memory limits
> --
>
> Key: YARN-8425
> URL: https://issues.apache.org/jira/browse/YARN-8425
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: applications, container-queuing, yarn
>Affects Versions: 2.7.6
>Reporter: Tapas Sen
>Priority: Major
> Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, 
> yarn_configuration_3.PNG
>
>
> Hi,
> Getting these error.
>  
> 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1527758146858_45040_m_08_3: Container 
> [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is 
> running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical 
> memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container.
>  
> Yarn resource configuration will in attachment. 
>  
>  Any lead would be appreciated.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits

2018-06-13 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha resolved YARN-8425.
-
Resolution: Not A Bug

> Yarn container getting killed due to running beyond physical memory limits
> --
>
> Key: YARN-8425
> URL: https://issues.apache.org/jira/browse/YARN-8425
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: applications, container-queuing, yarn
>Affects Versions: 2.7.6
>Reporter: Tapas Sen
>Priority: Major
> Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, 
> yarn_configuration_3.PNG
>
>
> Hi,
> Getting these error.
>  
> 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1527758146858_45040_m_08_3: Container 
> [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is 
> running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical 
> memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container.
>  
> Yarn resource configuration will in attachment. 
>  
>  Any lead would be appreciated.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits

2018-06-13 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511241#comment-16511241
 ] 

Gour Saha commented on YARN-8425:
-

If _yarn.nodemanager.pmem-check-enabled_ in your cluster is not explicitly set 
to false (since the default value is true) it is behaving as designed. Based on 
your requirement you can either request for containers higher than 8GB or set 
_yarn.nodemanager.pmem-check-enabled_ to false.

> Yarn container getting killed due to running beyond physical memory limits
> --
>
> Key: YARN-8425
> URL: https://issues.apache.org/jira/browse/YARN-8425
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: applications, container-queuing, yarn
>Affects Versions: 2.7.6
>Reporter: Tapas Sen
>Priority: Major
> Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, 
> yarn_configuration_3.PNG
>
>
> Hi,
> Getting these error.
>  
> 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1527758146858_45040_m_08_3: Container 
> [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is 
> running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical 
> memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container.
>  
> Yarn resource configuration will in attachment. 
>  
>  Any lead would be appreciated.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service doesn't work

2018-06-08 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506187#comment-16506187
 ] 

Gour Saha commented on YARN-8276:
-

Thank you [~GergelyNovak] for the patch and [~sunilg] for reviewing & 
committing.

> [UI2] After version field became mandatory, form-based submission of new YARN 
> service doesn't work
> --
>
> Key: YARN-8276
> URL: https://issues.apache.org/jira/browse/YARN-8276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8276.001.patch
>
>
> After version became mandatory in YARN service, one cannot create a new 
> service through UI, there is no way to specify the version field and the 
> service fails with the following message:
> {code}
> "Error: Adapter operation failed". 
> {code}
> Checking through browser dev tools, the REST response is the following:
> {code}
> {"diagnostics":"Version of service sleeper-service is either empty or not 
> provided"}
> {code}
> Discovered by [~vinodkv].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-31 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497334#comment-16497334
 ] 

Gour Saha commented on YARN-8308:
-

I uploaded patch 003 with the fixes.

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8308.001.patch, YARN-8308.002.patch, 
> YARN-8308.003.patch
>
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, rene

[jira] [Commented] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-31 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497333#comment-16497333
 ] 

Gour Saha commented on YARN-8308:
-

Thanks for reviewing the patch [~eyang]. I have updated the patch to ensure 
removeHdfsDelegationToken gets called for secure cluster only. The keytab and 
principal options are not mandatory in the CLI. Only service name is mandatory.

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8308.001.patch, YARN-8308.002.patch, 
> YARN-8308.003.patch
>
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.s

[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-31 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8308:

Attachment: YARN-8308.003.patch

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8308.001.patch, YARN-8308.002.patch, 
> YARN-8308.003.patch
>
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=152642

[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-31 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8308:

Attachment: YARN-8308.002.patch

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8308.001.patch, YARN-8308.002.patch
>
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425

[jira] [Comment Edited] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work

2018-05-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496169#comment-16496169
 ] 

Gour Saha edited comment on YARN-8276 at 5/31/18 6:38 AM:
--

[~sunilg], can you please review this. This is critical for 3.1.1.


was (Author: gsaha):
[~sunilg], can you please review this. This is critical for 3.1.0.

> [UI2] After version field became mandatory, form-based submission of new YARN 
> service through UI2 doesn't work
> --
>
> Key: YARN-8276
> URL: https://issues.apache.org/jira/browse/YARN-8276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Critical
> Attachments: YARN-8276.001.patch
>
>
> After version became mandatory in YARN service, one cannot create a new 
> service through UI, there is no way to specify the version field and the 
> service fails with the following message:
> {code}
> "Error: Adapter operation failed". 
> {code}
> Checking through browser dev tools, the REST response is the following:
> {code}
> {"diagnostics":"Version of service sleeper-service is either empty or not 
> provided"}
> {code}
> Discovered by [~vinodkv].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work

2018-05-30 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8276:

Target Version/s: 3.1.1

> [UI2] After version field became mandatory, form-based submission of new YARN 
> service through UI2 doesn't work
> --
>
> Key: YARN-8276
> URL: https://issues.apache.org/jira/browse/YARN-8276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Critical
> Attachments: YARN-8276.001.patch
>
>
> After version became mandatory in YARN service, one cannot create a new 
> service through UI, there is no way to specify the version field and the 
> service fails with the following message:
> {code}
> "Error: Adapter operation failed". 
> {code}
> Checking through browser dev tools, the REST response is the following:
> {code}
> {"diagnostics":"Version of service sleeper-service is either empty or not 
> provided"}
> {code}
> Discovered by [~vinodkv].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work

2018-05-30 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8276:

Priority: Critical  (was: Major)

> [UI2] After version field became mandatory, form-based submission of new YARN 
> service through UI2 doesn't work
> --
>
> Key: YARN-8276
> URL: https://issues.apache.org/jira/browse/YARN-8276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Critical
> Attachments: YARN-8276.001.patch
>
>
> After version became mandatory in YARN service, one cannot create a new 
> service through UI, there is no way to specify the version field and the 
> service fails with the following message:
> {code}
> "Error: Adapter operation failed". 
> {code}
> Checking through browser dev tools, the REST response is the following:
> {code}
> {"diagnostics":"Version of service sleeper-service is either empty or not 
> provided"}
> {code}
> Discovered by [~vinodkv].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work

2018-05-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496169#comment-16496169
 ] 

Gour Saha commented on YARN-8276:
-

[~sunilg], can you please review this. This is critical for 3.1.0.

> [UI2] After version field became mandatory, form-based submission of new YARN 
> service through UI2 doesn't work
> --
>
> Key: YARN-8276
> URL: https://issues.apache.org/jira/browse/YARN-8276
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Major
> Attachments: YARN-8276.001.patch
>
>
> After version became mandatory in YARN service, one cannot create a new 
> service through UI, there is no way to specify the version field and the 
> service fails with the following message:
> {code}
> "Error: Adapter operation failed". 
> {code}
> Checking through browser dev tools, the REST response is the following:
> {code}
> {"diagnostics":"Version of service sleeper-service is either empty or not 
> provided"}
> {code}
> Discovered by [~vinodkv].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-30 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8308:

Attachment: YARN-8308.001.patch

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
> Attachments: YARN-8308.001.patch
>
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNu

[jira] [Commented] (YARN-8367) 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator

2018-05-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496069#comment-16496069
 ] 

Gour Saha commented on YARN-8367:
-

I am not sure if the UT failure is related, but it succeeds in my local.

> 2 components, one with placement constraint and one without causes NPE in 
> SingleConstraintAppPlacementAllocator
> ---
>
> Key: YARN-8367
> URL: https://issues.apache.org/jira/browse/YARN-8367
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8367.001.patch
>
>
> While testing the fix for YARN-8350, [~billie.rinaldi] encountered this NPE 
> in AM log. Filling this on her behalf -
> {noformat}
> 2018-05-25 21:11:54,006 [AMRM Heartbeater thread] ERROR 
> impl.AMRMClientAsyncImpl - Exception on heartbeat
> java.lang.NullPointerException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest(SingleConstraintAppPlacementAllocator.java:245)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.internalUpdatePendingAsk(SingleConstraintAppPlacementAllocator.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.updatePendingAsk(SingleConstraintAppPlacementAllocator.java:207)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.addSchedulingRequests(AppSchedulingInfo.java:269)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateSchedulingRequests(AppSchedulingInfo.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateSchedulingRequests(SchedulerApplicationAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1154)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:278)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.

[jira] [Commented] (YARN-8367) 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator

2018-05-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495948#comment-16495948
 ] 

Gour Saha commented on YARN-8367:
-

[~cheersyang] thank you for the patch. 001 looks good. I even tested in my 
cluster where I was getting NPE and your patch fixes the problem. So +1 for 001 
patch. I think [~billie.rinaldi] also successfully tested your patch while 
testing YARN-8350.

> 2 components, one with placement constraint and one without causes NPE in 
> SingleConstraintAppPlacementAllocator
> ---
>
> Key: YARN-8367
> URL: https://issues.apache.org/jira/browse/YARN-8367
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8367.001.patch
>
>
> While testing the fix for YARN-8350, [~billie.rinaldi] encountered this NPE 
> in AM log. Filling this on her behalf -
> {noformat}
> 2018-05-25 21:11:54,006 [AMRM Heartbeater thread] ERROR 
> impl.AMRMClientAsyncImpl - Exception on heartbeat
> java.lang.NullPointerException: java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest(SingleConstraintAppPlacementAllocator.java:245)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.internalUpdatePendingAsk(SingleConstraintAppPlacementAllocator.java:193)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.updatePendingAsk(SingleConstraintAppPlacementAllocator.java:207)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.addSchedulingRequests(AppSchedulingInfo.java:269)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateSchedulingRequests(AppSchedulingInfo.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateSchedulingRequests(SchedulerApplicationAttempt.java:469)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1154)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:278)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.refl

[jira] [Commented] (YARN-8350) NPE in service AM related to placement policy

2018-05-30 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495610#comment-16495610
 ] 

Gour Saha commented on YARN-8350:
-

Thanks [~billie.rinaldi] for reviewing the patch.

The missing space between "%s" and "in" is deliberate. I wrote a comment above 
the code to explain -

{code}
 // Note: %sin is not a typo. Constraint name is optional so the error messages
 // below handle that scenario by adding a space if name is specified.
{code}

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch, YARN-8350.02.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-29 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8308:

Target Version/s: 3.1.1

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018

[jira] [Resolved] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved

2018-05-29 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha resolved YARN-8309.
-
Resolution: Won't Do

> Diagnostic message for yarn service app failure due token renewal should be 
> improved
> 
>
> Key: YARN-8309
> URL: https://issues.apache.org/jira/browse/YARN-8309
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Major
>
> When Yarn service application failed due to token renewal issue , The 
> diagonstic message was unclear . 
> {code:java}
> Application application_1526413043392_0002 failed 20 times due to AM 
> Container for appattempt_1526413043392_0002_20 exited with exitCode: 1 
> Failing this attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from 
> container-launch. Container id: container_e04_1526413043392_0002_20_01 
> Exit code: 1 Exception message: Launch container failed Shell output: main : 
> command provided 1 main : run as user is hbase main : requested yarn user is 
> hbase Getting exit code file... Creating script paths... Writing pid file... 
> Writing to tmp file 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp
>  Writing to cgroup task files... Creating local dirs... Launching 
> container... Getting exit code file... Creating script paths... [2018-05-15 
> 23:15:28.806]Container exited with a non-zero exit code 1. Error file: 
> prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 
> 23:15:28.807]Container exited with a non-zero exit code 1. Error file: 
> prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, 
> check the application tracking page: 
> https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on 
> links to logs of each attempt. . Failing the application.{code}
> Here, diagnostic message should be improved to specify that AM is failing due 
> to token renewal issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved

2018-05-29 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494326#comment-16494326
 ] 

Gour Saha commented on YARN-8309:
-

Once a fix for YARN-8308 is provided this diagnostics message fix won't be 
required. In fact, from the code perspective, the phase at which the token 
issue occurs, ATSv2 publisher initialization and RM registration cannot be 
done. So technically diagnostics message cannot be enhanced by AM.

> Diagnostic message for yarn service app failure due token renewal should be 
> improved
> 
>
> Key: YARN-8309
> URL: https://issues.apache.org/jira/browse/YARN-8309
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Priority: Major
>
> When Yarn service application failed due to token renewal issue , The 
> diagonstic message was unclear . 
> {code:java}
> Application application_1526413043392_0002 failed 20 times due to AM 
> Container for appattempt_1526413043392_0002_20 exited with exitCode: 1 
> Failing this attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from 
> container-launch. Container id: container_e04_1526413043392_0002_20_01 
> Exit code: 1 Exception message: Launch container failed Shell output: main : 
> command provided 1 main : run as user is hbase main : requested yarn user is 
> hbase Getting exit code file... Creating script paths... Writing pid file... 
> Writing to tmp file 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp
>  Writing to cgroup task files... Creating local dirs... Launching 
> container... Getting exit code file... Creating script paths... [2018-05-15 
> 23:15:28.806]Container exited with a non-zero exit code 1. Error file: 
> prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 
> 23:15:28.807]Container exited with a non-zero exit code 1. Error file: 
> prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, 
> check the application tracking page: 
> https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on 
> links to logs of each attempt. . Failing the application.{code}
> Here, diagnostic message should be improved to specify that AM is failing due 
> to token renewal issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-29 Thread Gour Saha (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha reassigned YARN-8308:
---

Assignee: Gour Saha

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2

[jira] [Commented] (YARN-8308) Yarn service app fails due to issues with Renew Token

2018-05-29 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494324#comment-16494324
 ] 

Gour Saha commented on YARN-8308:
-

will provide a patch for this issue

> Yarn service app fails due to issues with Renew Token
> -
>
> Key: YARN-8308
> URL: https://issues.apache.org/jira/browse/YARN-8308
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Major
>
> Run Yarn service application beyond 
> dfs.namenode.delegation.token.max-lifetime. 
> Here, yarn service application fails with below error. 
> {code}
> 2018-05-15 23:14:35,652 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
> 2018-05-15 23:14:35,654 [main] INFO  service.AbstractService - Service 
> Service Master failed in state INITED
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 
> sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 
> 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182)
>   at 
> org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316)
> 2018-05-15 23:14:35,659 [main] INFO  service.ServiceMaster - Stopping app 
> master
> 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting 
> service master
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, 
> realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, 

[jira] [Created] (YARN-8367) 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator

2018-05-25 Thread Gour Saha (JIRA)
Gour Saha created YARN-8367:
---

 Summary: 2 components, one with placement constraint and one 
without causes NPE in SingleConstraintAppPlacementAllocator
 Key: YARN-8367
 URL: https://issues.apache.org/jira/browse/YARN-8367
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 3.1.0
Reporter: Gour Saha


While testing the fix for YARN-8350, [~billie.rinaldi] encountered this NPE in 
AM log. Filling this on her behalf -
{noformat}
2018-05-25 21:11:54,006 [AMRM Heartbeater thread] ERROR 
impl.AMRMClientAsyncImpl - Exception on heartbeat
java.lang.NullPointerException: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest(SingleConstraintAppPlacementAllocator.java:245)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.internalUpdatePendingAsk(SingleConstraintAppPlacementAllocator.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.updatePendingAsk(SingleConstraintAppPlacementAllocator.java:207)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.addSchedulingRequests(AppSchedulingInfo.java:269)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateSchedulingRequests(AppSchedulingInfo.java:240)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateSchedulingRequests(SchedulerApplicationAttempt.java:469)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1154)
at 
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:278)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53)
at 
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Pr

[jira] [Commented] (YARN-8350) NPE in service AM related to placement policy

2018-05-25 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490809#comment-16490809
 ] 

Gour Saha commented on YARN-8350:
-

Thanks [~billie.rinaldi]. Patch 02 has all the files.

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch, YARN-8350.02.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy

2018-05-25 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8350:

Attachment: YARN-8350.02.patch

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch, YARN-8350.02.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8350) NPE in service AM related to placement policy

2018-05-25 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490802#comment-16490802
 ] 

Gour Saha commented on YARN-8350:
-

Oops, good catch I missed attaching the file Component.java in the patch. 
Attaching it right now.

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy

2018-05-24 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8350:

Component/s: yarn-native-services

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy

2018-05-24 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8350:

Target Version/s: 3.1.1

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy

2018-05-24 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8350:

Attachment: YARN-8350.01.patch

> NPE in service AM related to placement policy
> -
>
> Key: YARN-8350
> URL: https://issues.apache.org/jira/browse/YARN-8350
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8350.01.patch
>
>
> It seems like this NPE is happening in a service with more than one component 
> when one component has a placement policy and the other does not. It causes 
> the AM to crash.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310)
> at 
> org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919)
> at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-18 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480659#comment-16480659
 ] 

Gour Saha commented on YARN-7530:
-

+1 for this change. [~csingh], I am assuming that after the git moves all UTs 
are still running fine.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8243) Flex down should remove instance with largest component instance ID first

2018-05-11 Thread Gour Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472742#comment-16472742
 ] 

Gour Saha commented on YARN-8243:
-

Thanks [~billie.rinaldi] and [~suma.shivaprasad] for reviewing. Also thanks to 
[~billie.rinaldi] for committing the patch.

> Flex down should remove instance with largest component instance ID first
> -
>
> Key: YARN-8243
> URL: https://issues.apache.org/jira/browse/YARN-8243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8243.01.patch, YARN-8243.02.patch
>
>
> This is easy to test on a service with anti-affinity component, to simulate 
> pending container requests. It can be simulated by other means also (no 
> resource left in cluster, etc.).
> Service yarnfile used to test this -
> {code:java}
> {
>   "name": "sleeper-service",
>   "version": "1",
>   "components" :
>   [
> {
>   "name": "ping",
>   "number_of_containers": 2,
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "launch_command": "sleep 9000",
>   "placement_policy": {
> "constraints": [
>   {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
>   "ping"
> ]
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Launch a service with the above yarnfile as below -
> {code:java}
> yarn app -launch simple-aa-1 simple_AA.json
> {code}
> Let's assume there are only 5 nodes in this cluster. Now, flex the above 
> service to 1 extra container than the number of nodes (6 in my case).
> {code:java}
> yarn app -flex simple-aa-1 -component ping 6
> {code}
> Only 5 containers will be allocated and running for simple-aa-1. At this 
> point, flex it down to 5 containers -
> {code:java}
> yarn app -flex simple-aa-1 -component ping 5
> {code}
> This is what is seen in the serviceam log at this point -
> {noformat}
> 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
> service.ClientAMService - Flexing component ping to 5
> 2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
> [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
> 2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Flexed down by user, destroying.
> 2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
> 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Deleting registry path 
> /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06
> 2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>   at java.lang.Thread.run(Thread.java:745)
> 2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state

[jira] [Updated] (YARN-8243) Flex down should first remove pending container requests (if any) and then kill running containers

2018-05-10 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-8243:

Priority: Critical  (was: Major)

> Flex down should first remove pending container requests (if any) and then 
> kill running containers
> --
>
> Key: YARN-8243
> URL: https://issues.apache.org/jira/browse/YARN-8243
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8243.01.patch, YARN-8243.02.patch
>
>
> This is easy to test on a service with anti-affinity component, to simulate 
> pending container requests. It can be simulated by other means also (no 
> resource left in cluster, etc.).
> Service yarnfile used to test this -
> {code:java}
> {
>   "name": "sleeper-service",
>   "version": "1",
>   "components" :
>   [
> {
>   "name": "ping",
>   "number_of_containers": 2,
>   "resource": {
> "cpus": 1,
> "memory": "256"
>   },
>   "launch_command": "sleep 9000",
>   "placement_policy": {
> "constraints": [
>   {
> "type": "ANTI_AFFINITY",
> "scope": "NODE",
> "target_tags": [
>   "ping"
> ]
>   }
> ]
>   }
> }
>   ]
> }
> {code}
> Launch a service with the above yarnfile as below -
> {code:java}
> yarn app -launch simple-aa-1 simple_AA.json
> {code}
> Let's assume there are only 5 nodes in this cluster. Now, flex the above 
> service to 1 extra container than the number of nodes (6 in my case).
> {code:java}
> yarn app -flex simple-aa-1 -component ping 6
> {code}
> Only 5 containers will be allocated and running for simple-aa-1. At this 
> point, flex it down to 5 containers -
> {code:java}
> yarn app -flex simple-aa-1 -component ping 5
> {code}
> This is what is seen in the serviceam log at this point -
> {noformat}
> 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO  
> service.ClientAMService - Flexing component ping to 5
> 2018-05-03 20:17:38,469 [Component  dispatcher] INFO  component.Component - 
> [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5
> 2018-05-03 20:17:38,470 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Flexed down by user, destroying.
> 2018-05-03 20:17:38,473 [Component  dispatcher] INFO  component.Component - 
> [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event.
> 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO  
> registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : 
> container_1525297086734_0013_01_06]: Deleting registry path 
> /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06
> 2018-05-03 20:17:38,476 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574)
>   at 
> org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>   at java.lang.Thread.run(Thread.java:745)
> 2018-05-03 20:17:38,480 [Component  dispatcher] ERROR component.Component - 
> [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CHECK_STABLE at STABLE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$Intern

  1   2   3   4   5   6   7   >