[jira] [Commented] (YARN-9002) YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682434#comment-16682434 ] Gour Saha commented on YARN-9002: - Thanks a lot [~eyang] > YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS > and local filesystem only > - > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-9002-branch-3.1.001.patch, > YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9002) YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-9002: Summary: YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS and local filesystem only (was: YARN Service keytab location is restricted to HDFS and local filesystem only) > YARN Service keytab does not support s3, wasb, gs and is restricted to HDFS > and local filesystem only > - > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-9002-branch-3.1.001.patch, > YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682205#comment-16682205 ] Gour Saha commented on YARN-9002: - Thanks [~eyang] for reviewing. I uploaded 002 patch with imports removed. Uploaded the trunk patch also. > YARN Service keytab location is restricted to HDFS and local filesystem only > > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: YARN-9002-branch-3.1.001.patch, > YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-9002: Attachment: YARN-9002.001.patch > YARN Service keytab location is restricted to HDFS and local filesystem only > > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: YARN-9002-branch-3.1.001.patch, > YARN-9002-branch-3.1.002.patch, YARN-9002.001.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-9002: Attachment: YARN-9002-branch-3.1.002.patch > YARN Service keytab location is restricted to HDFS and local filesystem only > > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: YARN-9002-branch-3.1.001.patch, > YARN-9002-branch-3.1.002.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682166#comment-16682166 ] Gour Saha commented on YARN-9002: - Tested this patch in tandem with the patch in HIVE-20899 on a cluster based on branch-3.1 and it passed for wasb. /cc [~eyang], when you get a chance please review the patch for branch-3.1. The trunk code has diverged a bit so I am preparing a separate patch for trunk. > YARN Service keytab location is restricted to HDFS and local filesystem only > > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: YARN-9002-branch-3.1.001.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-9002: Attachment: YARN-9002-branch-3.1.001.patch > YARN Service keytab location is restricted to HDFS and local filesystem only > > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > Attachments: YARN-9002-branch-3.1.001.patch > > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
Gour Saha created YARN-9002: --- Summary: YARN Service keytab location is restricted to HDFS and local filesystem only Key: YARN-9002 URL: https://issues.apache.org/jira/browse/YARN-9002 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.1 Reporter: Gour Saha ServiceClient.java specifically checks if the keytab URI scheme is hdfs or file. This restricts it from supporting other FileSystem API conforming FSs like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9002) YARN Service keytab location is restricted to HDFS and local filesystem only
[ https://issues.apache.org/jira/browse/YARN-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha reassigned YARN-9002: --- Assignee: Gour Saha > YARN Service keytab location is restricted to HDFS and local filesystem only > > > Key: YARN-9002 > URL: https://issues.apache.org/jira/browse/YARN-9002 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Major > > ServiceClient.java specifically checks if the keytab URI scheme is hdfs or > file. This restricts it from supporting other FileSystem API conforming FSs > like s3a, wasb, gs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8682) YARN Service throws NPE when explicit null instead of empty object {} is used
[ https://issues.apache.org/jira/browse/YARN-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662524#comment-16662524 ] Gour Saha commented on YARN-8682: - [~csingh] it was encountered by the hive team long time back so I don't have the stack. Seems like recent code changes are handling this case. In that case feel free to close this bug as cannot reproduce. > YARN Service throws NPE when explicit null instead of empty object {} is used > - > > Key: YARN-8682 > URL: https://issues.apache.org/jira/browse/YARN-8682 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.0.1 >Reporter: Gour Saha >Assignee: Chandni Singh >Priority: Major > > YARN Service should not throw NPE for a config like this - > {code} > . > . > "configuration": { > "env": { > "HADOOP_CONF_DIR": "/hadoop-conf", > "USER": "testuser", > "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS": > "/sys/fs/cgroup:/sys/fs/cgroup:ro", > "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true" > }, > "files": null > } > . > . > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629168#comment-16629168 ] Gour Saha commented on YARN-8734: - bq. Yes, component dependencies is also outside of component properties in the component section. I think this is aligned correctly. [~eyang], am I missing something here? Please see where "dependencies" is defined in Component_dependencies.png vs where it is defined in Service_dependencies.png (attached). > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8734) Readiness check for remote service belongs to the same user
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8734: Attachment: Service_dependencies.png Component_dependencies.png > Readiness check for remote service belongs to the same user > --- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Component_dependencies.png, Dependency check vs.pdf, > Service_dependencies.png, YARN-8734.001.patch, YARN-8734.002.patch, > YARN-8734.003.patch, YARN-8734.004.patch, YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624346#comment-16624346 ] Gour Saha commented on YARN-8734: - Yup, it would be great to get [~billie.rinaldi] thoughts on the naming as well. Actually, all properties including dependencies should be under the properties section. That's how it is for Component also. Please re-check. I hope I am not missing something. > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624319#comment-16624319 ] Gour Saha commented on YARN-8734: - In that case may be a simpler approach will be to call this property "dependencies". It is already at the service level so it implies service level dependencies. Just like dependencies at the component level implies component dependencies and is simply called "dependencies". Additionally, avoiding the remote or external keywords helps avoid confusion or limitations in service owner's mind. Just like component "dependencies" validate that the values are valid component names, expectation would be that service level "dependencies" will be valid YARN services only. At least that's exactly what the code does. One code review comment: Is {{remote_service_dependencies}} defined outside the properties section in YAML swagger spec? > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8734) Readiness check for remote service
[ https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624250#comment-16624250 ] Gour Saha commented on YARN-8734: - [~eyang] this is a pretty useful feature so thanks for taking this up. Although I did not get a chance to test the patch it overall looks okay. But one question: from a naming perspective, the opposite of remote is local. What does local service mean? Are we excluding local services? To me, it seems like we wanted to mean external services instead of remote services. Thoughts? > Readiness check for remote service > -- > > Key: YARN-8734 > URL: https://issues.apache.org/jira/browse/YARN-8734 > Project: Hadoop YARN > Issue Type: New Feature > Components: yarn-native-services >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: Dependency check vs.pdf, YARN-8734.001.patch, > YARN-8734.002.patch, YARN-8734.003.patch, YARN-8734.004.patch, > YARN-8734.005.patch > > > When a service is deploying, there can be remote service dependency. It > would be nice to describe ZooKeeper as a dependent service, and the service > has reached a stable state, then deploy HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8779) Fix few discrepancies between YARN Service swagger spec and code
Gour Saha created YARN-8779: --- Summary: Fix few discrepancies between YARN Service swagger spec and code Key: YARN-8779 URL: https://issues.apache.org/jira/browse/YARN-8779 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.1.1, 3.1.0 Reporter: Gour Saha Following issues were identified in YARN Service swagger definition during an effort to integrate with a running service by generating Java and Go client-side stubs from the spec - 1. *restartPolicy* is wrong and should be *restart_policy* 2. A DELETE request to a non-existing service (or a previously existing but deleted service) throws an ApiException instead of something like NotFoundException (the equivalent of 404). Note, DELETE of an existing service behaves fine. 3. The response code of DELETE request is 200. The spec says 204. Since the response has a payload, the spec should be updated to 200 instead of 204. 4. _DefaultApi.java_ client's _appV1ServicesServiceNameGetWithHttpInfo_ method does not return a Service object. Swagger definition has the below bug in GET response of */app/v1/services/\{service_name}* - {code:java} type: object items: $ref: '#/definitions/Service' {code} It should be - {code:java} $ref: '#/definitions/Service' {code} 5. Serialization issues were seen in all enum classes - ServiceState.java, ContainerState.java, ComponentState.java, PlacementType.java and PlacementScope.java. Java client threw the below exception for ServiceState - {code:java} Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance of `org.apache.cb.yarn.service.api.records.ServiceState` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('ACCEPTED') at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 121] (through reference chain: org.apache.cb.yarn.service.api.records.Service["state”]) {code} For Golang we saw this for ContainerState - {code:java} ERRO[2018-08-12T23:32:31.851-07:00] During GET request: json: cannot unmarshal string into Go struct field Container.state of type yarnmodel.ContainerState {code} 6. *launch_time* actually returns an integer but swagger definition says date. Hence, the following exception is seen on the client side - {code:java} Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Unexpected token (VALUE_NUMBER_INT), expected START_ARRAY: Expected array or string. at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream); line: 1, column: 477] (through reference chain: org.apache.cb.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Component["containers"]->java.util.ArrayList[0]->org.apache.cb.yarn.service.api.records.Container["launch_time”]) {code} 8. *user.name* query param with a valid value is required for all API calls to an unsecure cluster. This is not defined in the spec. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8682) YARN Service throws NPE when explicit null instead of empty object {} is used
Gour Saha created YARN-8682: --- Summary: YARN Service throws NPE when explicit null instead of empty object {} is used Key: YARN-8682 URL: https://issues.apache.org/jira/browse/YARN-8682 Project: Hadoop YARN Issue Type: Bug Components: yarn-native-services Affects Versions: 3.0.1 Reporter: Gour Saha YARN Service should not throw NPE for a config like this - {code} . . "configuration": { "env": { "HADOOP_CONF_DIR": "/hadoop-conf", "USER": "testuser", "YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS": "/sys/fs/cgroup:/sys/fs/cgroup:ro", "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true" }, "files": null } . . {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5738) Allow services to release/kill specific containers
[ https://issues.apache.org/jira/browse/YARN-5738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-5738: Target Version/s: 3.0.3 Component/s: yarn-native-services > Allow services to release/kill specific containers > -- > > Key: YARN-5738 > URL: https://issues.apache.org/jira/browse/YARN-5738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Reporter: Siddharth Seth >Priority: Major > > There are occasions on which specific containers may not be required by a > service. Would be useful to have support to return these to YARN. > Slider flex doesn't give this control. > cc [~gsaha], [~vinodkv] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8136) Add version attribute to site doc examples and quickstart
[ https://issues.apache.org/jira/browse/YARN-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568588#comment-16568588 ] Gour Saha commented on YARN-8136: - +1. Patch looks good to me. > Add version attribute to site doc examples and quickstart > - > > Key: YARN-8136 > URL: https://issues.apache.org/jira/browse/YARN-8136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: site >Reporter: Gour Saha >Priority: Major > Attachments: YARN-8136.001.patch > > > version attribute is missing in the following 2 site doc files - > src/site/markdown/yarn-service/Examples.md > src/site/markdown/yarn-service/QuickStart.md -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification
[ https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564275#comment-16564275 ] Gour Saha commented on YARN-8392: - Thanks [~billie.rinaldi]. Patch 4 looks good. The documentation in swagger definition (YARN-Simplified-V1-API-Layer-For-Services.yaml), examples (YARN-Services-Examples.md) and site documentation are quite generic since it talks about the broader placement policy support. However, do you want to review them once and see if we should add some specific examples for this symmetric usecase. > Allow multiple tags for anti-affinity placement policy in service > specification > --- > > Key: YARN-8392 > URL: https://issues.apache.org/jira/browse/YARN-8392 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch, > YARN-8392.4.patch > > > Currently the service client code is restricting a component's target tags to > include only a single tag, the component name. I have a use case for two > components having anti-affinity with themselves and with each other. The YARN > placement policies support this, but the service framework isn't allowing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564250#comment-16564250 ] Gour Saha commented on YARN-8579: - Thanks [~csingh]. [~eyang] please review and commit when you get a chance. > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch, > YARN-8579.003.patch, YARN-8579.004.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Attachment: YARN-8579.004.patch > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch, > YARN-8579.003.patch, YARN-8579.004.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification
[ https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562621#comment-16562621 ] Gour Saha commented on YARN-8392: - [~billie.rinaldi] thank you for the patch. This symmetric scenario will be a good one to open up for services. The patch looks good to me. +1 for it. Just one comment on the error message - {code} String ERROR_PLACEMENT_POLICY_TAG_INVALID = "Invalid target tag %s " + "specified in placement policy of component %s. Component %s must " + "also appear in placement policy of component %s with the same " + "constraint type."; {code} Since we are checking for scope in addition to constraint type, should we explicitly mention that in the error message too? > Allow multiple tags for anti-affinity placement policy in service > specification > --- > > Key: YARN-8392 > URL: https://issues.apache.org/jira/browse/YARN-8392 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi >Priority: Critical > Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch > > > Currently the service client code is restricting a component's target tags to > include only a single tag, the component name. I have a use case for two > components having anti-affinity with themselves and with each other. The YARN > placement policies support this, but the service framework isn't allowing it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562586#comment-16562586 ] Gour Saha commented on YARN-8579: - Ah, nice catch [~csingh]. That's exactly what the issue was. With the fix in FairScheduler.java, the test now passes for both FAIR and CAPACITY schedulers. I am running all the tests now and will upload the updated patch after they all pass. > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch, > YARN-8579.003.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560862#comment-16560862 ] Gour Saha commented on YARN-8579: - None of the test failures are related to the code change and all patches have completely different non-overlapping test failures. > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch, > YARN-8579.003.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException
[ https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560421#comment-16560421 ] Gour Saha commented on YARN-8522: - [~Zian Chen] 002 patch looks ok. I don't have a good setup to test this. Were you able to reproduce this issue in a cluster without your patch and then test that your patch fixes it? Do you think we can write a test for it? > Application fails with InvalidResourceRequestException > -- > > Key: YARN-8522 > URL: https://issues.apache.org/jira/browse/YARN-8522 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Zian Chen >Priority: Major > Attachments: YARN-8522.001.patch, YARN-8522.002.patch > > > Launch multiple streaming app simultaneously. Here, sometimes one of the > application fails with below stack trace. > {code} > 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: > java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to > xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused, while invoking > ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying > after sleeping for 3ms. > 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: > Invocation returned exception: > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > on [rm2], so propagating back to caller. > 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/hrt_qa/.staging/job_1530515284077_0007 > 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, only one resource request with * is allowed > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560415#comment-16560415 ] Gour Saha commented on YARN-8579: - Thanks [~csingh] for the review. I uploaded 003 with your suggestion. I do have one fundamental question though. I don't understand why for FAIR scheduler the below assert fails (which means no NMTokens are sent over even with this patch). The method where I made the code change is a common method which is called by both Fair and Capacity Schedulers. Any idea? That's why I had to enable this assert for CAPACITY scheduler only. I don't have a cluster setup where I can test FairScheduler. {code} if (getSchedulerType().equals(SchedulerType.CAPACITY)) { Assert.assertEquals(1, nmTokens.size()); // container 3 is running on node 2 Assert.assertEquals(nm2Address, nmTokens.get(0).getNodeId().toString()); } {code} > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch, > YARN-8579.003.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Attachment: YARN-8579.003.patch > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch, > YARN-8579.003.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560197#comment-16560197 ] Gour Saha commented on YARN-8579: - [~csingh], please review the patch when you get a chance. > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560193#comment-16560193 ] Gour Saha commented on YARN-8579: - Uploaded 002 with a few more asserts in the test. > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Attachment: YARN-8579.002.patch > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch, YARN-8579.002.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560105#comment-16560105 ] Gour Saha commented on YARN-8429: - Awesome. Thanks again [~eyang]. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Fix Version/s: 3.1.2 3.2.0 > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8579.001.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559134#comment-16559134 ] Gour Saha commented on YARN-8429: - Thanks [~eyang] for the commit. Can you please commit it to branch-3.1 also since it is targetted for 3.1.2 release also? > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8579: Attachment: YARN-8579.001.patch > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8579.001.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559122#comment-16559122 ] Gour Saha commented on YARN-8579: - Uploading patch 001 with a fix that I successfully tested in my cluster > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8579.001.patch > > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559121#comment-16559121 ] Gour Saha commented on YARN-8579: - I investigated this issue and figured that the root cause is the missing NM tokens corresponding to the containers which were passed to the AM after registration via the onContainersReceivedFromPreviousAttempts callback. This is required with the change made in YARN-6168. Exception seen in AM log is as below - {code} 2018-07-26 23:22:31,373 [pool-5-thread-4] ERROR instance.ComponentInstance - [COMPINSTANCE httpd-proxy-0 : container_e15_1532637883791_0001_01_04] Failed to get container status on ctr-e138-1518143905142-412155-01-05.hwx.site:25454, will try again org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for ctr-e138-1518143905142-412155-01-05.hwx.site:25454 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:262) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:252) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:137) at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.getContainerStatus(NMClientImpl.java:323) at org.apache.hadoop.yarn.service.component.instance.ComponentInstance$ContainerStatusRetriever.run(ComponentInstance.java:596) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsub
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558931#comment-16558931 ] Gour Saha commented on YARN-8429: - Mistakenly had a test commented out in patch 003. Undoing that in patch 004. Thanks [~billie.rinaldi] for catching that. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Attachment: YARN-8429.004.patch > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch, YARN-8429.004.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8580) yarn.resourcemanager.am.max-attempts is not respected for yarn services
[ https://issues.apache.org/jira/browse/YARN-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556935#comment-16556935 ] Gour Saha commented on YARN-8580: - Actually, this is Yarn Service specific property. So the value 20 is getting set because that's the default for Yarn Services. The reason 100 was not taking effect is - for Yarn Service the property name is yarn.service.am-restart.max-attempts and not yarn.resourcemanager.am.max-attempts. Once the right property is set, the desired behavior will be seen. It is still an Invalid jira though. > yarn.resourcemanager.am.max-attempts is not respected for yarn services > --- > > Key: YARN-8580 > URL: https://issues.apache.org/jira/browse/YARN-8580 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Priority: Major > > 1) Max am attempt is set to 100 on all nodes. ( including gateway) > {code} > > yarn.resourcemanager.am.max-attempts > 100 > {code} > 2) Start a Yarn service ( Hbase tarball ) application > 3) Kill AM 20 times > Here, App fails with below diagnostics. > {code} > bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status > application_1532481557746_0001 > 18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over > to rm2 > 18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml > at file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml > Application Report : > Application-Id : application_1532481557746_0001 > Application-Name : hbase-tarball-lr > Application-Type : yarn-service > User : hbase > Queue : default > Application Priority : 0 > Start-Time : 1532481864863 > Finish-Time : 1532522943103 > Progress : 100% > State : FAILED > Final-State : FAILED > Tracking-URL : > https://xxx:8090/cluster/app/application_1532481557746_0001 > RPC Port : -1 > AM Host : N/A > Aggregate Resource Allocation : 252150112 MB-seconds, 164141 > vcore-seconds > Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds > Log Aggregation Status : SUCCEEDED > Diagnostics : Application application_1532481557746_0001 failed 20 > times (global limit =100; local limit is =20) due to AM Container for > appattempt_1532481557746_0001_20 exited with exitCode: 137 > Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed > on request. Exit code is 137 > [2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. > [2018-07-25 12:49:03.045]Killed by external signal > For more detailed output, check the application tracking page: > https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on > links to logs of each attempt. > . Failing the application. > Unmanaged Application : false > Application Node Label Expression : > AM container Node Label Expression : > TimeoutType : LIFETIME ExpiryTime : 2018-07-25T22:26:15.419+ > RemainingTime : 0seconds > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8579) New AM attempt could not retrieve previous attempt component data
[ https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha reassigned YARN-8579: --- Assignee: Gour Saha > New AM attempt could not retrieve previous attempt component data > - > > Key: YARN-8579 > URL: https://issues.apache.org/jira/browse/YARN-8579 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Critical > > Steps: > 1) Launch httpd-docker > 2) Wait for app to be in STABLE state > 3) Run validation for app (It takes around 3 mins) > 4) Stop all Zks > 5) Wait 60 sec > 6) Kill AM > 7) wait for 30 sec > 8) Start all ZKs > 9) Wait for application to finish > 10) Validate expected containers of the app > Expected behavior: > New attempt of AM should start and docker containers launched by 1st attempt > should be recovered by new attempt. > Actual behavior: > New AM attempt starts. It can not recover 1st attempt docker containers. It > can not read component details from ZK. > Thus, it starts new attempt for all containers. > {code} > 2018-07-19 22:42:47,595 [main] INFO service.ServiceScheduler - Registering > appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into > registry > 2018-07-19 22:42:47,611 [main] INFO service.ServiceScheduler - Received 1 > containers from previous attempt. > 2018-07-19 22:42:47,642 [main] INFO service.ServiceScheduler - Could not > read component paths: > `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components': > No such file or directory: KeeperErrorCode = NoNode for > /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Handling > container_e08_1531977563978_0015_01_03 from previous attempt > 2018-07-19 22:42:47,643 [main] INFO service.ServiceScheduler - Record not > found in registry for container container_e08_1531977563978_0015_01_03 > from previous attempt, releasing > 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO > impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019 > 2018-07-19 22:42:47,651 [main] INFO service.ServiceScheduler - Triggering > initial evaluation of component httpd > 2018-07-19 22:42:47,652 [main] INFO component.Component - [INIT COMPONENT > httpd]: 2 instances. > 2018-07-19 22:42:47,652 [main] INFO component.Component - [COMPONENT httpd] > Requesting for 2 container(s){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed
[ https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556267#comment-16556267 ] Gour Saha commented on YARN-8545: - [~csingh] patch 001 looks good to me. +1. > YARN native service should return container if launch failed > > > Key: YARN-8545 > URL: https://issues.apache.org/jira/browse/YARN-8545 > Project: Hadoop YARN > Issue Type: Task >Reporter: Wangda Tan >Assignee: Chandni Singh >Priority: Critical > Attachments: YARN-8545.001.patch > > > In some cases, container launch may fail but container will not be properly > returned to RM. > This could happen when AM trying to prepare container launch context but > failed w/o sending container launch context to NM (Once container launch > context is sent to NM, NM will report failed container to RM). > Exception like: > {code:java} > java.io.FileNotFoundException: File does not exist: > hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591) > at > org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388) > at > org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253) > at > org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152) > at > org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > And even after container launch context prepare failed, AM still trying to > monitor container's readiness: > {code:java} > 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO monitor.ServiceMonitor - > Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 > 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP > presence", exception="java.io.IOException: primary-worker-0: IP is not > available yet" > ...{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551618#comment-16551618 ] Gour Saha commented on YARN-8429: - Patch 003 has the absolute->relative suggested change. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Attachment: YARN-8429.003.patch > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch, > YARN-8429.003.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Attachment: YARN-8429.002.patch > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551276#comment-16551276 ] Gour Saha commented on YARN-8429: - Uploaded 002 with a test fix. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch, YARN-8429.002.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Fix Version/s: (was: 3.1.1) 3.1.2 > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8429.001.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551208#comment-16551208 ] Gour Saha commented on YARN-8429: - [~eyang] please review when you get a chance. It's a usability improvement patch. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8429.001.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551204#comment-16551204 ] Gour Saha commented on YARN-8429: - [~yeshavora], with this patch the error msg will change to - {code} For component httpd with no artifact, dest_file must not be an absolute path: /xxx/xxx {code} This should help identify easily that component httpd is being identified as a component with no artifact and help detect this typo faster. > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8429.001.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Fix Version/s: 3.1.1 3.2.0 > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8429.001.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8429: Attachment: YARN-8429.001.patch > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Attachments: YARN-8429.001.patch > > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8429) Improve diagnostic message when artifact is not set properly
[ https://issues.apache.org/jira/browse/YARN-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha reassigned YARN-8429: --- Assignee: Gour Saha > Improve diagnostic message when artifact is not set properly > > > Key: YARN-8429 > URL: https://issues.apache.org/jira/browse/YARN-8429 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > > Steps: > 1) Create launch json file. Replace "artifact" with "artifacts" > 2) launch yarn service app with cli > The application launch fails with below error > {code} > [xxx xxx]$ yarn app -launch test2-2 test.json > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.AHSProxy: Connecting to Application History > server at xxx/xxx:10200 > 18/06/14 17:08:00 INFO client.ApiServiceClient: Loading service definition > from local FS: /xxx/test.json > 18/06/14 17:08:01 INFO util.log: Logging initialized @2782ms > 18/06/14 17:08:01 ERROR client.ApiServiceClient: Dest_file must not be > absolute path: /xxx/xxx > {code} > artifact field is not mandatory. However, If that field is specified > incorrectly, launch cmd should fail with proper error. > Here, The error message regarding Dest file is misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548417#comment-16548417 ] Gour Saha commented on YARN-8301: - Great. Patch 4 looks good. Not sure why I see the trailing whitespaces when I apply the patch. The jenkins build should tell us. +1 for 004 pending jenkins. > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch, YARN-8301.004.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json
[ https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548375#comment-16548375 ] Gour Saha commented on YARN-8542: - [~csingh] agreed that the API is to request for containers. However, the structure I proposed adheres to the current status API structure and the swagger definition. Note, service owners are already parsing through the component instances across multiple components in the status response payload if they need a single collection of all component instances. If you add a new attribute "component_name" now, you would need to modify the swagger definition and it would actually mean a change for the end-users since they would have to handle the containers API output differently from the status API output. Let me know what you think. > Yarn Service: Add component name to container json > -- > > Key: YARN-8542 > URL: https://issues.apache.org/jira/browse/YARN-8542 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > GET app/v1/services/{\{service-name}}/component-instances returns a list of > containers with YARN-8299. > {code:java} > [ > { > "id": "container_1531508836237_0001_01_03", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509014497, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-1" > }, > { > "id": "container_1531508836237_0001_01_02", > "ip": "192.168.2.51", > "hostname": "HW12119.local", > "state": "READY", > "launch_time": 1531509013492, > "bare_host": "192.168.2.51", > "component_instance_name": "sleeper-0" > } > ]{code} > {{component_name}} is not part of container json, so it is hard to tell which > component an instance belongs to. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548360#comment-16548360 ] Gour Saha commented on YARN-8301: - [~csingh], patch 2 looks good. Let's add to the top of this doc - "Experimental Feature - Tech Preview" and create a reference to it from Overview.md (and also mention it there that it is an Experimental Feature - Tech Preview). Thanks [~eyang] for pointing this out. Few minor comments - 1. In line 148 do we need the line "name": "sleeper-service" in the JSON spec for version 1.0.1 of the service. 2. Remove the trailing whitespaces from all the lines > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch, YARN-8301.002.patch, > YARN-8301.003.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545719#comment-16545719 ] Gour Saha commented on YARN-8299: - Just thinking aloud. Would returning the data in this format make more sense? This would maintain consistency with the status command output as well - {code} [ { "name": "ping", "containers": [ { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "ping-0", "hostname": "ping-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_02", "ip": "172.26.111.21", "launch_time": 1531767377301, "state": "READY" }, { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "ping-1", "hostname": "ping-1.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_07", "ip": "172.26.111.21", "launch_time": 1531767410395, "state": "RUNNING_BUT_UNREADY" } ] }, { "name": "sleep", "containers": [ { "bare_host": "eyang-5.openstacklocal", "component_instance_name": "sleep-0", "hostname": "sleep-0.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_04", "ip": "172.26.111.20", "launch_time": 1531767377710, "state": "READY" }, { "bare_host": "eyang-4.openstacklocal", "component_instance_name": "sleep-1", "hostname": "sleep-1.qqq.hbase.ycluster", "id": "container_1531765479645_0002_01_05", "ip": "172.26.111.21", "launch_time": 1531767378303, "state": "READY" } ] } ] {code} > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch, > YARN-8299.003.patch, YARN-8299.004.patch, YARN-8299.005.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation
[ https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545711#comment-16545711 ] Gour Saha commented on YARN-8301: - Thanks [~csingh] for the doc patch. Overall looks good. Few comments - 1. Change {{path_to_service_def_file}} to {{path_to_new_service_def_file}} 2. Can you add a sample response of a {{status}} output for the version 1.0.0 of the sleeper service and paste it just above the "Initiate Upgrade" section? This will put a lot of subsequent references in context, like your service is named "my-sleeper" and what sleeper-0 and sleeper-1 are when you refer them in "Upgrade Instance" section. 3. Can you add an "Upgrade Component" example right after "Upgrade Instance"? 4. In the "Finalize Upgrade" section can you change it to - {code:java} User must finalize the upgrade using the below command (since autoFinalize was not specified during initiate):{code} > Yarn Service Upgrade: Add documentation > --- > > Key: YARN-8301 > URL: https://issues.apache.org/jira/browse/YARN-8301 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8301.001.patch > > > Add documentation for yarn service upgrade. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543828#comment-16543828 ] Gour Saha commented on YARN-8299: - One more minor fix in the below comment in ApplicationCLI.java - {code} // not appAttemptIf format, it could be appName. {code} change appAttemptIf to appAttemptId. +1 for 004 patch post jenkins. > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch, > YARN-8299.003.patch, YARN-8299.004.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543756#comment-16543756 ] Gour Saha commented on YARN-8299: - testFilterWithState fails locally in my env as well. > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch, > YARN-8299.003.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543694#comment-16543694 ] Gour Saha edited comment on YARN-8299 at 7/13/18 9:00 PM: -- In ApplicationCLI.java can we change - {code} opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " + "Application Name"); {code} to - {code} opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID"); {code} This will keep it in-line with "yarn app" descriptions like "yarn app -status ". was (Author: gsaha): In ApplicationCLI.java can we change - {code} opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " + "Application Name"); {code} to - {code} opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID"; {code} This will keep it in-line with "yarn app" descriptions like "yarn app -status ". > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543694#comment-16543694 ] Gour Saha edited comment on YARN-8299 at 7/13/18 8:56 PM: -- In ApplicationCLI.java can we change - {code} opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " + "Application Name"); {code} to - {code} opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID"; {code} This will keep it in-line with "yarn app" descriptions like "yarn app -status ". was (Author: gsaha): In ApplicationCLI.java can we changed - {code} opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " + "Application Name"); {code} to - {code} opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID>"; {code} This will keep it in-line with "yarn app" descriptions like "yarn app -status ". > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543694#comment-16543694 ] Gour Saha commented on YARN-8299: - In ApplicationCLI.java can we changed - {code} opts.getOption(LIST_CMD).setArgName("Application Attempt ID or " + "Application Name"); {code} to - {code} opts.getOption(LIST_CMD).setArgName("Application Name or Attempt ID>"; {code} This will keep it in-line with "yarn app" descriptions like "yarn app -status ". > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543680#comment-16543680 ] Gour Saha commented on YARN-8299: - Ah, my bad. The patch already supports -components. > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543676#comment-16543676 ] Gour Saha commented on YARN-8299: - We need -components support, whether we support -states in tandem with it or not. However, the 2 options together make sense too since I might be interested to list all containers in READY state across all components in a single API call. > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543663#comment-16543663 ] Gour Saha edited comment on YARN-8299 at 7/13/18 8:28 PM: -- [~csingh] shouldn't we add a filter "-components compNameA[,compNameB,...]" to filter the container list further for specific components? was (Author: gsaha): [~csingh] shouldn't we add a filter "-components compNameA[,compNameB,...]" to filter the container list further for specific components. > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params
[ https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543663#comment-16543663 ] Gour Saha commented on YARN-8299: - [~csingh] shouldn't we add a filter "-components compNameA[,compNameB,...]" to filter the container list further for specific components. > Yarn Service Upgrade: Add GET APIs that returns instances matching query > params > --- > > Key: YARN-8299 > URL: https://issues.apache.org/jira/browse/YARN-8299 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8299.001.patch, YARN-8299.002.patch > > > We need APIs that returns containers that match the query params. These are > needed so that we can find out what containers have been upgraded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8360) Yarn service conflict between restart policy and NM configuration
[ https://issues.apache.org/jira/browse/YARN-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539405#comment-16539405 ] Gour Saha commented on YARN-8360: - Thanks [~suma.shivaprasad], patch 1 looks good to me. +1. > Yarn service conflict between restart policy and NM configuration > -- > > Key: YARN-8360 > URL: https://issues.apache.org/jira/browse/YARN-8360 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Chandni Singh >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8360.1.patch > > > For the below spec, the service will not stop even after container failures > because of the NM auto retry properties : > * "yarn.service.container-failure.retry.max": 1, > * "yarn.service.container-failure.validity-interval-ms": 5000 > The NM will continue auto-restarting containers. > {{fail_after 20}} fails after 20 seconds. Since the validity failure > interval is 5 seconds, NM will auto restart the container. > {code:java} > { > "name": "fail-demo2", > "version": "1.0.0", > "components" : > [ > { > "name": "comp1", > "number_of_containers": 1, > "launch_command": "fail_after 20", > "restart_policy": "NEVER", > "resource": { > "cpus": 1, > "memory": "256" > }, > "configuration": { > "properties": { > "yarn.service.container-failure.retry.max": 1, > "yarn.service.container-failure.validity-interval-ms": 5000 > } > } > } > ] > } > {code} > If {{restart_policy}} is NEVER, then the service should stop after the > container fails. > Since we have introduced, the service level Restart Policies, I think we > should make the NM auto retry configurations part of the {{RetryPolicy}} and > get rid of all {{yarn.service.container-failure.**}} properties. Otherwise it > gets confusing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8485) Priviledged container app launch is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530428#comment-16530428 ] Gour Saha commented on YARN-8485: - bq. This would ensure we don't accidentally call a rogue sudo command I actually agree to this, since a rogue user could add any rogue sudo script to the PATH and pass this check. +1 to the get_docker_binary style OR explicitly checking both /bin/sudo and /usr/bin/sudo to keep the patch simple for now. We should fail if both the paths fail. > Priviledged container app launch is failing intermittently > -- > > Key: YARN-8485 > URL: https://issues.apache.org/jira/browse/YARN-8485 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services > Environment: Debian >Reporter: Yesha Vora >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8485.001.patch, YARN-8485.002.patch > > > Privileged application fails intermittently > {code:java} > yarn jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar > -shell_command "sleep 30" -num_containers 1 -shell_env > YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=xxx -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER=true -jar > /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar{code} > Here, container launch fails with 'Privileged containers are disabled' even > though Docker privilege container is enabled in the cluster > {code:java|title=nm log} > 2018-06-28 21:21:15,647 INFO runtime.DockerLinuxContainerRuntime > (DockerLinuxContainerRuntime.java:allowPrivilegedContainerExecution(664)) - > All checks pass. Launching privileged container for : > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,665 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container > container_e01_1530220647587_0001_01_02 is : 29 > 2018-06-28 21:21:15,666 WARN nodemanager.LinuxContainerExecutor > (LinuxContainerExecutor.java:handleExitCode(599)) - Exception from > container-launch with container ID: > container_e01_1530220647587_0001_01_02 and exit code: 29 > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Launch container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(DockerLinuxContainerRuntime.java:958) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:141) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:564) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:479) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:494) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception from container-launch. > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Container id: > container_e01_1530220647587_0001_01_02 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exit code: 29 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Exception message: Launch container > failed > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Shell error output: check > privileges failed for user: hrt_qa, error code: 0 > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Privileged containers are disabled > for user: hrt_qa > 2018-06-28 21:21:15,668 INFO nodemanager.ContainerExecutor > (ContainerExecutor.java:logOutput(541)) - Error constructing docker command, > d
[jira] [Commented] (YARN-8445) YARN native service doesn't allow service name equals to component name
[ https://issues.apache.org/jira/browse/YARN-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528382#comment-16528382 ] Gour Saha commented on YARN-8445: - We might have to revisit this. It seems to be an issue in how we publish entities to ATSv2. We shouldn't have blocked a component name same as service name in validation. The example service sleeper itself will not run if a user tries to do "yarn app -launch sleeper sleeper". > YARN native service doesn't allow service name equals to component name > --- > > Key: YARN-8445 > URL: https://issues.apache.org/jira/browse/YARN-8445 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Fix For: 3.1.1 > > Attachments: YARN-8445.001.patch > > > Now YARN service doesn't allow specifying service name equals to component > name. > And it causes AM launch fails with msg like: > {code} > org.apache.hadoop.metrics2.MetricsException: Metrics source tf-zeppelin > already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.yarn.service.ServiceMetrics.register(ServiceMetrics.java:75) > at > org.apache.hadoop.yarn.service.component.Component.(Component.java:193) > at > org.apache.hadoop.yarn.service.ServiceScheduler.createAllComponents(ServiceScheduler.java:552) > at > org.apache.hadoop.yarn.service.ServiceScheduler.buildInstance(ServiceScheduler.java:251) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceInit(ServiceScheduler.java:283) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:142) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:338) > 2018-06-18 06:50:39,473 [main] INFO service.ServiceScheduler - Stopping > service scheduler > {code} > It's better to add this check in validation phase instead of failing AM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits
[ https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16512014#comment-16512014 ] Gour Saha commented on YARN-8425: - If you are not reporting a bug or improvement, sending email to u...@hadoop.apache.org is the right way to get your doubts and questions answered. Knowing your application's needs and asking for containers of the right size is the way to go. Disabling pmem check is not recommended in prod clusters. > Yarn container getting killed due to running beyond physical memory limits > -- > > Key: YARN-8425 > URL: https://issues.apache.org/jira/browse/YARN-8425 > Project: Hadoop YARN > Issue Type: Task > Components: applications, container-queuing, yarn >Affects Versions: 2.7.6 >Reporter: Tapas Sen >Priority: Major > Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, > yarn_configuration_3.PNG > > > Hi, > Getting these error. > > 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1527758146858_45040_m_08_3: Container > [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is > running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical > memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container. > > Yarn resource configuration will in attachment. > > Any lead would be appreciated. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits
[ https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha resolved YARN-8425. - Resolution: Not A Bug > Yarn container getting killed due to running beyond physical memory limits > -- > > Key: YARN-8425 > URL: https://issues.apache.org/jira/browse/YARN-8425 > Project: Hadoop YARN > Issue Type: Task > Components: applications, container-queuing, yarn >Affects Versions: 2.7.6 >Reporter: Tapas Sen >Priority: Major > Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, > yarn_configuration_3.PNG > > > Hi, > Getting these error. > > 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1527758146858_45040_m_08_3: Container > [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is > running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical > memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container. > > Yarn resource configuration will in attachment. > > Any lead would be appreciated. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits
[ https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511241#comment-16511241 ] Gour Saha commented on YARN-8425: - If _yarn.nodemanager.pmem-check-enabled_ in your cluster is not explicitly set to false (since the default value is true) it is behaving as designed. Based on your requirement you can either request for containers higher than 8GB or set _yarn.nodemanager.pmem-check-enabled_ to false. > Yarn container getting killed due to running beyond physical memory limits > -- > > Key: YARN-8425 > URL: https://issues.apache.org/jira/browse/YARN-8425 > Project: Hadoop YARN > Issue Type: Task > Components: applications, container-queuing, yarn >Affects Versions: 2.7.6 >Reporter: Tapas Sen >Priority: Major > Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, > yarn_configuration_3.PNG > > > Hi, > Getting these error. > > 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1527758146858_45040_m_08_3: Container > [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is > running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical > memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container. > > Yarn resource configuration will in attachment. > > Any lead would be appreciated. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506187#comment-16506187 ] Gour Saha commented on YARN-8276: - Thank you [~GergelyNovak] for the patch and [~sunilg] for reviewing & committing. > [UI2] After version field became mandatory, form-based submission of new YARN > service doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497334#comment-16497334 ] Gour Saha commented on YARN-8308: - I uploaded patch 003 with the fixes. > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Attachments: YARN-8308.001.patch, YARN-8308.002.patch, > YARN-8308.003.patch > > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, rene
[jira] [Commented] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497333#comment-16497333 ] Gour Saha commented on YARN-8308: - Thanks for reviewing the patch [~eyang]. I have updated the patch to ensure removeHdfsDelegationToken gets called for secure cluster only. The keytab and principal options are not mandatory in the CLI. Only service name is mandatory. > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Attachments: YARN-8308.001.patch, YARN-8308.002.patch, > YARN-8308.003.patch > > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.s
[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8308: Attachment: YARN-8308.003.patch > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Attachments: YARN-8308.001.patch, YARN-8308.002.patch, > YARN-8308.003.patch > > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=152642
[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8308: Attachment: YARN-8308.002.patch > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Attachments: YARN-8308.001.patch, YARN-8308.002.patch > > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425
[jira] [Comment Edited] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496169#comment-16496169 ] Gour Saha edited comment on YARN-8276 at 5/31/18 6:38 AM: -- [~sunilg], can you please review this. This is critical for 3.1.1. was (Author: gsaha): [~sunilg], can you please review this. This is critical for 3.1.0. > [UI2] After version field became mandatory, form-based submission of new YARN > service through UI2 doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8276: Target Version/s: 3.1.1 > [UI2] After version field became mandatory, form-based submission of new YARN > service through UI2 doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8276: Priority: Critical (was: Major) > [UI2] After version field became mandatory, form-based submission of new YARN > service through UI2 doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Critical > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8276) [UI2] After version field became mandatory, form-based submission of new YARN service through UI2 doesn't work
[ https://issues.apache.org/jira/browse/YARN-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496169#comment-16496169 ] Gour Saha commented on YARN-8276: - [~sunilg], can you please review this. This is critical for 3.1.0. > [UI2] After version field became mandatory, form-based submission of new YARN > service through UI2 doesn't work > -- > > Key: YARN-8276 > URL: https://issues.apache.org/jira/browse/YARN-8276 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Major > Attachments: YARN-8276.001.patch > > > After version became mandatory in YARN service, one cannot create a new > service through UI, there is no way to specify the version field and the > service fails with the following message: > {code} > "Error: Adapter operation failed". > {code} > Checking through browser dev tools, the REST response is the following: > {code} > {"diagnostics":"Version of service sleeper-service is either empty or not > provided"} > {code} > Discovered by [~vinodkv]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8308: Attachment: YARN-8308.001.patch > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > Attachments: YARN-8308.001.patch > > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNu
[jira] [Commented] (YARN-8367) 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator
[ https://issues.apache.org/jira/browse/YARN-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496069#comment-16496069 ] Gour Saha commented on YARN-8367: - I am not sure if the UT failure is related, but it succeeds in my local. > 2 components, one with placement constraint and one without causes NPE in > SingleConstraintAppPlacementAllocator > --- > > Key: YARN-8367 > URL: https://issues.apache.org/jira/browse/YARN-8367 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 3.1.0 >Reporter: Gour Saha >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8367.001.patch > > > While testing the fix for YARN-8350, [~billie.rinaldi] encountered this NPE > in AM log. Filling this on her behalf - > {noformat} > 2018-05-25 21:11:54,006 [AMRM Heartbeater thread] ERROR > impl.AMRMClientAsyncImpl - Exception on heartbeat > java.lang.NullPointerException: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest(SingleConstraintAppPlacementAllocator.java:245) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.internalUpdatePendingAsk(SingleConstraintAppPlacementAllocator.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.updatePendingAsk(SingleConstraintAppPlacementAllocator.java:207) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.addSchedulingRequests(AppSchedulingInfo.java:269) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateSchedulingRequests(AppSchedulingInfo.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateSchedulingRequests(SchedulerApplicationAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1154) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:278) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.
[jira] [Commented] (YARN-8367) 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator
[ https://issues.apache.org/jira/browse/YARN-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495948#comment-16495948 ] Gour Saha commented on YARN-8367: - [~cheersyang] thank you for the patch. 001 looks good. I even tested in my cluster where I was getting NPE and your patch fixes the problem. So +1 for 001 patch. I think [~billie.rinaldi] also successfully tested your patch while testing YARN-8350. > 2 components, one with placement constraint and one without causes NPE in > SingleConstraintAppPlacementAllocator > --- > > Key: YARN-8367 > URL: https://issues.apache.org/jira/browse/YARN-8367 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 3.1.0 >Reporter: Gour Saha >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8367.001.patch > > > While testing the fix for YARN-8350, [~billie.rinaldi] encountered this NPE > in AM log. Filling this on her behalf - > {noformat} > 2018-05-25 21:11:54,006 [AMRM Heartbeater thread] ERROR > impl.AMRMClientAsyncImpl - Exception on heartbeat > java.lang.NullPointerException: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest(SingleConstraintAppPlacementAllocator.java:245) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.internalUpdatePendingAsk(SingleConstraintAppPlacementAllocator.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.updatePendingAsk(SingleConstraintAppPlacementAllocator.java:207) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.addSchedulingRequests(AppSchedulingInfo.java:269) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateSchedulingRequests(AppSchedulingInfo.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateSchedulingRequests(SchedulerApplicationAttempt.java:469) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1154) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:278) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.refl
[jira] [Commented] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495610#comment-16495610 ] Gour Saha commented on YARN-8350: - Thanks [~billie.rinaldi] for reviewing the patch. The missing space between "%s" and "in" is deliberate. I wrote a comment above the code to explain - {code} // Note: %sin is not a typo. Constraint name is optional so the error messages // below handle that scenario by adding a space if name is specified. {code} > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch, YARN-8350.02.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8308: Target Version/s: 3.1.1 > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018
[jira] [Resolved] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved
[ https://issues.apache.org/jira/browse/YARN-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha resolved YARN-8309. - Resolution: Won't Do > Diagnostic message for yarn service app failure due token renewal should be > improved > > > Key: YARN-8309 > URL: https://issues.apache.org/jira/browse/YARN-8309 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Priority: Major > > When Yarn service application failed due to token renewal issue , The > diagonstic message was unclear . > {code:java} > Application application_1526413043392_0002 failed 20 times due to AM > Container for appattempt_1526413043392_0002_20 exited with exitCode: 1 > Failing this attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from > container-launch. Container id: container_e04_1526413043392_0002_20_01 > Exit code: 1 Exception message: Launch container failed Shell output: main : > command provided 1 main : run as user is hbase main : requested yarn user is > hbase Getting exit code file... Creating script paths... Writing pid file... > Writing to tmp file > /grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp > Writing to cgroup task files... Creating local dirs... Launching > container... Getting exit code file... Creating script paths... [2018-05-15 > 23:15:28.806]Container exited with a non-zero exit code 1. Error file: > prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 > 23:15:28.807]Container exited with a non-zero exit code 1. Error file: > prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, > check the application tracking page: > https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on > links to logs of each attempt. . Failing the application.{code} > Here, diagnostic message should be improved to specify that AM is failing due > to token renewal issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8309) Diagnostic message for yarn service app failure due token renewal should be improved
[ https://issues.apache.org/jira/browse/YARN-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494326#comment-16494326 ] Gour Saha commented on YARN-8309: - Once a fix for YARN-8308 is provided this diagnostics message fix won't be required. In fact, from the code perspective, the phase at which the token issue occurs, ATSv2 publisher initialization and RM registration cannot be done. So technically diagnostics message cannot be enhanced by AM. > Diagnostic message for yarn service app failure due token renewal should be > improved > > > Key: YARN-8309 > URL: https://issues.apache.org/jira/browse/YARN-8309 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Priority: Major > > When Yarn service application failed due to token renewal issue , The > diagonstic message was unclear . > {code:java} > Application application_1526413043392_0002 failed 20 times due to AM > Container for appattempt_1526413043392_0002_20 exited with exitCode: 1 > Failing this attempt.Diagnostics: [2018-05-15 23:15:28.779]Exception from > container-launch. Container id: container_e04_1526413043392_0002_20_01 > Exit code: 1 Exception message: Launch container failed Shell output: main : > command provided 1 main : run as user is hbase main : requested yarn user is > hbase Getting exit code file... Creating script paths... Writing pid file... > Writing to tmp file > /grid/0/hadoop/yarn/local/nmPrivate/application_1526413043392_0002/container_e04_1526413043392_0002_20_01/container_e04_1526413043392_0002_20_01.pid.tmp > Writing to cgroup task files... Creating local dirs... Launching > container... Getting exit code file... Creating script paths... [2018-05-15 > 23:15:28.806]Container exited with a non-zero exit code 1. Error file: > prelaunch.err. Last 4096 bytes of prelaunch.err : [2018-05-15 > 23:15:28.807]Container exited with a non-zero exit code 1. Error file: > prelaunch.err. Last 4096 bytes of prelaunch.err : For more detailed output, > check the application tracking page: > https://xxx:8090/cluster/app/application_1526413043392_0002 Then click on > links to logs of each attempt. . Failing the application.{code} > Here, diagnostic message should be improved to specify that AM is failing due > to token renewal issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha reassigned YARN-8308: --- Assignee: Gour Saha > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2
[jira] [Commented] (YARN-8308) Yarn service app fails due to issues with Renew Token
[ https://issues.apache.org/jira/browse/YARN-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494324#comment-16494324 ] Gour Saha commented on YARN-8308: - will provide a patch for this issue > Yarn service app fails due to issues with Renew Token > - > > Key: YARN-8308 > URL: https://issues.apache.org/jira/browse/YARN-8308 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Gour Saha >Priority: Major > > Run Yarn service application beyond > dfs.namenode.delegation.token.max-lifetime. > Here, yarn service application fails with below error. > {code} > 2018-05-15 23:14:35,652 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > 2018-05-15 23:14:35,654 [main] INFO service.AbstractService - Service > Service Master failed in state INITED > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164, > sequenceNumber=7, masterKeyId=8) is expired, current time: 2018-05-15 > 23:14:35,651+ expected renewal time: 2018-05-15 23:09:59,164+ > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:883) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1654) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1569) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1566) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1581) > at > org.apache.hadoop.yarn.service.utils.JsonSerDeser.load(JsonSerDeser.java:182) > at > org.apache.hadoop.yarn.service.utils.ServiceApiUtil.loadServiceFrom(ServiceApiUtil.java:337) > at > org.apache.hadoop.yarn.service.ServiceMaster.loadApplicationJson(ServiceMaster.java:242) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:91) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:316) > 2018-05-15 23:14:35,659 [main] INFO service.ServiceMaster - Stopping app > master > 2018-05-15 23:14:35,660 [main] ERROR service.ServiceMaster - Error starting > service master > org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (token for hbase: HDFS_DELEGATION_TOKEN owner=hbase, renewer=yarn, > realUser=rm/x...@example.com, issueDate=1526423999164, maxDate=1526425799164,
[jira] [Created] (YARN-8367) 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator
Gour Saha created YARN-8367: --- Summary: 2 components, one with placement constraint and one without causes NPE in SingleConstraintAppPlacementAllocator Key: YARN-8367 URL: https://issues.apache.org/jira/browse/YARN-8367 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.1.0 Reporter: Gour Saha While testing the fix for YARN-8350, [~billie.rinaldi] encountered this NPE in AM log. Filling this on her behalf - {noformat} 2018-05-25 21:11:54,006 [AMRM Heartbeater thread] ERROR impl.AMRMClientAsyncImpl - Exception on heartbeat java.lang.NullPointerException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.validateAndSetSchedulingRequest(SingleConstraintAppPlacementAllocator.java:245) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.internalUpdatePendingAsk(SingleConstraintAppPlacementAllocator.java:193) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.updatePendingAsk(SingleConstraintAppPlacementAllocator.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.addSchedulingRequests(AppSchedulingInfo.java:269) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateSchedulingRequests(AppSchedulingInfo.java:240) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateSchedulingRequests(SchedulerApplicationAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:1154) at org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:278) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.SchedulerPlacementProcessor.allocate(SchedulerPlacementProcessor.java:53) at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:433) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Pr
[jira] [Commented] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490809#comment-16490809 ] Gour Saha commented on YARN-8350: - Thanks [~billie.rinaldi]. Patch 02 has all the files. > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch, YARN-8350.02.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8350: Attachment: YARN-8350.02.patch > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch, YARN-8350.02.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490802#comment-16490802 ] Gour Saha commented on YARN-8350: - Oops, good catch I missed attaching the file Component.java in the patch. Attaching it right now. > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8350: Component/s: yarn-native-services > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8350: Target Version/s: 3.1.1 > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8350) NPE in service AM related to placement policy
[ https://issues.apache.org/jira/browse/YARN-8350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8350: Attachment: YARN-8350.01.patch > NPE in service AM related to placement policy > - > > Key: YARN-8350 > URL: https://issues.apache.org/jira/browse/YARN-8350 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Billie Rinaldi >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8350.01.patch > > > It seems like this NPE is happening in a service with more than one component > when one component has a placement policy and the other does not. It causes > the AM to crash. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.service.component.Component.requestContainers(Component.java:644) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:310) > at > org.apache.hadoop.yarn.service.component.Component$FlexComponentTransition.transition(Component.java:293) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:919) > at > org.apache.hadoop.yarn.service.ServiceScheduler.serviceStart(ServiceScheduler.java:344) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.service.ServiceMaster.lambda$serviceStart$0(ServiceMaster.java:253) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.yarn.service.ServiceMaster.serviceStart(ServiceMaster.java:251) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:317) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480659#comment-16480659 ] Gour Saha commented on YARN-7530: - +1 for this change. [~csingh], I am assuming that after the git moves all UTs are still running fine. > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Trivial > Fix For: yarn-native-services > > Attachments: YARN-7530.001.patch, YARN-7530.002.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8243) Flex down should remove instance with largest component instance ID first
[ https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472742#comment-16472742 ] Gour Saha commented on YARN-8243: - Thanks [~billie.rinaldi] and [~suma.shivaprasad] for reviewing. Also thanks to [~billie.rinaldi] for committing the patch. > Flex down should remove instance with largest component instance ID first > - > > Key: YARN-8243 > URL: https://issues.apache.org/jira/browse/YARN-8243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8243.01.patch, YARN-8243.02.patch > > > This is easy to test on a service with anti-affinity component, to simulate > pending container requests. It can be simulated by other means also (no > resource left in cluster, etc.). > Service yarnfile used to test this - > {code:java} > { > "name": "sleeper-service", > "version": "1", > "components" : > [ > { > "name": "ping", > "number_of_containers": 2, > "resource": { > "cpus": 1, > "memory": "256" > }, > "launch_command": "sleep 9000", > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "ping" > ] > } > ] > } > } > ] > } > {code} > Launch a service with the above yarnfile as below - > {code:java} > yarn app -launch simple-aa-1 simple_AA.json > {code} > Let's assume there are only 5 nodes in this cluster. Now, flex the above > service to 1 extra container than the number of nodes (6 in my case). > {code:java} > yarn app -flex simple-aa-1 -component ping 6 > {code} > Only 5 containers will be allocated and running for simple-aa-1. At this > point, flex it down to 5 containers - > {code:java} > yarn app -flex simple-aa-1 -component ping 5 > {code} > This is what is seen in the serviceam log at this point - > {noformat} > 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO > service.ClientAMService - Flexing component ping to 5 > 2018-05-03 20:17:38,469 [Component dispatcher] INFO component.Component - > [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5 > 2018-05-03 20:17:38,470 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Flexed down by user, destroying. > 2018-05-03 20:17:38,473 [Component dispatcher] INFO component.Component - > [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event. > 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO > registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Deleting registry path > /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06 > 2018-05-03 20:17:38,476 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2018-05-03 20:17:38,480 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state
[jira] [Updated] (YARN-8243) Flex down should first remove pending container requests (if any) and then kill running containers
[ https://issues.apache.org/jira/browse/YARN-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gour Saha updated YARN-8243: Priority: Critical (was: Major) > Flex down should first remove pending container requests (if any) and then > kill running containers > -- > > Key: YARN-8243 > URL: https://issues.apache.org/jira/browse/YARN-8243 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Gour Saha >Assignee: Gour Saha >Priority: Critical > Attachments: YARN-8243.01.patch, YARN-8243.02.patch > > > This is easy to test on a service with anti-affinity component, to simulate > pending container requests. It can be simulated by other means also (no > resource left in cluster, etc.). > Service yarnfile used to test this - > {code:java} > { > "name": "sleeper-service", > "version": "1", > "components" : > [ > { > "name": "ping", > "number_of_containers": 2, > "resource": { > "cpus": 1, > "memory": "256" > }, > "launch_command": "sleep 9000", > "placement_policy": { > "constraints": [ > { > "type": "ANTI_AFFINITY", > "scope": "NODE", > "target_tags": [ > "ping" > ] > } > ] > } > } > ] > } > {code} > Launch a service with the above yarnfile as below - > {code:java} > yarn app -launch simple-aa-1 simple_AA.json > {code} > Let's assume there are only 5 nodes in this cluster. Now, flex the above > service to 1 extra container than the number of nodes (6 in my case). > {code:java} > yarn app -flex simple-aa-1 -component ping 6 > {code} > Only 5 containers will be allocated and running for simple-aa-1. At this > point, flex it down to 5 containers - > {code:java} > yarn app -flex simple-aa-1 -component ping 5 > {code} > This is what is seen in the serviceam log at this point - > {noformat} > 2018-05-03 20:17:38,469 [IPC Server handler 0 on 38124] INFO > service.ClientAMService - Flexing component ping to 5 > 2018-05-03 20:17:38,469 [Component dispatcher] INFO component.Component - > [FLEX DOWN COMPONENT ping]: scaling down from 6 to 5 > 2018-05-03 20:17:38,470 [Component dispatcher] INFO > instance.ComponentInstance - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Flexed down by user, destroying. > 2018-05-03 20:17:38,473 [Component dispatcher] INFO component.Component - > [COMPONENT ping] Transitioned from FLEXING to STABLE on FLEX event. > 2018-05-03 20:17:38,474 [pool-5-thread-8] INFO > registry.YarnRegistryViewForProviders - [COMPINSTANCE ping-4 : > container_1525297086734_0013_01_06]: Deleting registry path > /users/root/services/yarn-service/simple-aa-1/components/ctr-1525297086734-0013-01-06 > 2018-05-03 20:17:38,476 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.service.component.Component.handle(Component.java:913) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:574) > at > org.apache.hadoop.yarn.service.ServiceScheduler$ComponentEventHandler.handle(ServiceScheduler.java:563) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2018-05-03 20:17:38,480 [Component dispatcher] ERROR component.Component - > [COMPONENT ping]: Invalid event CHECK_STABLE at STABLE > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > CHECK_STABLE at STABLE > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:388) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$Intern