[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3040: --- Fix Version/s: 2.9.0 > [Data Model] Make putEntities operation be aware of the app's context > - > > Key: YARN-3040 > URL: https://issues.apache.org/jira/browse/YARN-3040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Fix For: 2.9.0, 3.0.0-alpha1 > > Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, > YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch > > > Per design in YARN-2928, implement client-side API for handling *flows*. > Frameworks should be able to define and pass in all attributes of flows and > flow runs to YARN, and they should be passed into ATS writers. > YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3040: - Attachment: YARN-3040.6.patch [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch, YARN-3040.6.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.5.patch [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch, YARN-3040.5.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.4.patch [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch, YARN-3040.4.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.3.patch Upload a new patch to address the comments so far. The notable change in this patch is to remove the timestamp suffix. And add the default for RM_CLUSTER_ID, such that the ID won't change across RM restarting or failover. [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch, YARN-3040.3.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Attachment: YARN-3040.2.patch The new patch changes the way that to pass in the context information to the aggregator. Again it's based on the assumption that the context won't change during the lifecycle of the app. Therefore, we don't need to specify the context info for every put-entity request, but set it to the timeline collector when is starts. The backend and the context information to keep is not change altered in the new patch. In the new data flow of context information, clusterId is obtained from the configuration, appId is obtained when constructing the timeline collector. User and flow and flow run info will be passed to the collector at the starting stage via collector-NM RPC interface. Among the three, user info is already available in NM, flow and flow run need to be provided by the user when submitting the application via the tag field. This info will be passed to NM when starting the AM container via the env of CLC. The collector will issue the query to NM to ask for this info. The distributed shell has been updated to show how the client can pass flow and flow run info into the application. Test cases has been modified and added to verify: 1 the newly added RPC call works, 2 the context info works e2e. To answer Sangjin' s questions: bq. How can individual frameworks (MR, tez, ...) set these attributes and pass them to the RM at the time of the application launch? How does that information get passed to the TimelineClient and to the timeline collector? The the description of the context information data flow before. And take a look at DS app for reference. bq. It sounds like each NM will need to have multiple timeline clients (one for each application). That's correct. bq. The RM will have its own collector, and it does not go through the TimelineClient API. How would that work? RM will have all the above context info. When constructing and starting RM collector, we should make sure it be setup. bq. flowId should be flowName (that's the standard terminology we're using) Personally, I prefer to user ID to be uniform among the all the context properties. ID indicates it can be used to identify a flow. bq. flow version seems to be missing from this; while flow version is not part of the primary key of the entity, it is a necessary attribute bq. I think flow run id can (and should) be a long; it doesn't have to be a generic string I thought version is part of flow id. I think we can revisit it once the schema is done, and we finalized the *generic* description about the flow structure and the notation. So far I'd like to keep it as what it is now. Thoughts? bq. the default cluster id should be just the cluster name; I'm not sure why we need to add the cluster start timestamp; It makes sense, but when RM restarts we use the new start time of RM to identify the app instead of the one before. In current way, cluster_xyz will contain the application_xyz_123. This was my rationale before. And this default cluster id construction is only used in the case the user didn't specify the cluster id in config file. In production, user should specify one. I'll thought about the question again. bq. hopefully isUnitTest can be removed with the changes I made in the previous commit Right. It's not necessary. [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch, YARN-3040.2.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Make putEntities operation be aware of the app's context
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3040: -- Summary: [Data Model] Make putEntities operation be aware of the app's context (was: [Data Model] Implement client-side API for handling flows) [Data Model] Make putEntities operation be aware of the app's context - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3040.1.patch Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)