[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660952#comment-14660952 ] Zhijie Shen commented on YARN-3049: --- As the issue is not blocking the whole reader implementation, how about letting this patch in first? [~sjlee0]? Some more comments about the issue: 1. ColumnHelper needs to be updated as well to return a byte[] column name instead of a String one. 2. I'm worried that Bytes.toString() doesn't make the long integer be stored as the way we want. If it isn't stored as the 8 bytes, we may not guarantee the order of event columns. 3. FlowRunId in the row key should be fine, because the row key is never converted to String again. But it's good to double check. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660853#comment-14660853 ] Zhijie Shen commented on YARN-3049: --- Here's a quick example: {code} @Test public void test() { // imitate the process to write a long Long a = 1234567890L; byte[] b = Bytes.toBytes(a); String c = Bytes.toString(b); // imitate the process to read a long byte[] d = Bytes.toBytes(c); Long e = Bytes.toLong(d); assertEquals(a, e); } {code} b and d are different bytes, then. Do I use Bytes in a wrong way? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.7.patch Attach a new patch: 1. Rebase against YARN-3984. 2. Address Sangjin and Li's comments. There's still a remaining issue: the timestamp will not be ser/des correctly by using UTF-8. I didn't figure the reason, but I did an experiment that the bytes were converted into string and then bytes, and they became different. Still need to do more investigation about this problem. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch, > YARN-3049-YARN-2928.7.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659183#comment-14659183 ] Zhijie Shen commented on YARN-3984: --- bq. If the info map is not empty, this record would be redundant and will take up storage space. Make sense. The patch looks good to me. Will commit it. > Rethink event column key issue > -- > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > Attachments: YARN-3984-YARN-2928.001.patch > > > Currently, the event column key is event_id?info_key?timestamp, which is not > so friendly to fetching all the events of an entity and sorting them in a > chronologic order. IMHO, timestamp?event_id?info_key may be a better key > schema. I open this jira to continue the discussion about it which was > commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.6.patch Upload a new patch which makes HBase backend to make the decision locally. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch, YARN-3049-YARN-2928.6.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4013) Publisher V2 should write the unmanaged AM flag too
Zhijie Shen created YARN-4013: - Summary: Publisher V2 should write the unmanaged AM flag too Key: YARN-4013 URL: https://issues.apache.org/jira/browse/YARN-4013 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Upon rebase the branch, I find we need to redo the similar work for V2 publisher: https://issues.apache.org/jira/browse/YARN-3543 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652806#comment-14652806 ] Zhijie Shen commented on YARN-3049: --- Okay, what will the timestamp be used to do? If there're too much context info required, I agree it's not elegant to incrementally expose them to the backend. One step back, I start to understand that the real situation actually deviates from what I originally thought about the storage layer. When defining the data model, I defined a generic TimelineEntity and make other first-class citizen entities extend it. Then, we uniformly process the entities no matter what their type is. What we discussed so far implies that we cannot only treat the entities so generally. For application entity, we may need to take an additional step to parse its start/finish event to write more records. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
[ https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3993: -- Attachment: YARN-3993-YARN-2928.0001.patch Rename the patch to make it for the branch. > Change to use the AM flag in ContainerContext determine AM container > > > Key: YARN-3993 > URL: https://issues.apache.org/jira/browse/YARN-3993 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Zhijie Shen >Assignee: Sunil G > Labels: newbie > Attachments: 0001-YARN-3993-YARN-2928.patch, > YARN-3993-YARN-2928.0001.patch > > > After YARN-3116, we will have a flag in ContainerContext to determine if the > container is AM or not in aux service. We need to change accordingly to make > use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.5.patch I made a change in the new patch to reflect my last proposal. The user don't need to explicitly tell it's the start of a new app. Instead, I added "firstRequest" as the context of the app collector. RM collector manager sets this flag to true upon adding a new app collector at RM side. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch, > YARN-3049-YARN-2928.5.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
[ https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652225#comment-14652225 ] Zhijie Shen commented on YARN-3993: --- +1 LGTM. Will commit after jenkins' comment. > Change to use the AM flag in ContainerContext determine AM container > > > Key: YARN-3993 > URL: https://issues.apache.org/jira/browse/YARN-3993 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Zhijie Shen >Assignee: Sunil G > Labels: newbie > Attachments: 0001-YARN-3993-YARN-2928.patch > > > After YARN-3116, we will have a flag in ContainerContext to determine if the > container is AM or not in aux service. We need to change accordingly to make > use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652206#comment-14652206 ] Zhijie Shen commented on YARN-3049: --- Hi Sangjin, Thanks for your comments. The proposed method will work for now and can minimize the change we should make. In fact, I used to think of this method too. The reason why I abandoned it is that the method couple the business logic and data storage. It potentially increase the risk that the change in the business logic will break the storage layer. For example, we rename app_created as app_started. This may be still easy to fix, but the maintenance difficulty is likely to increase as logic grows more complex. That's why I think we should let app collector to tell the backend that it's the first request. On the other side, I agree RM should be responsible for this too. Actually this is also what I did in the current patch. If you think my proposal of letting app collector to determine if it is the first request, the way we can do is to extend RM app collector and implement this logic there. Thanks, Zhijie > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650106#comment-14650106 ] Zhijie Shen commented on YARN-3049: --- What I meant before is that HBaseTimelineWriterImpl is not aware of a life cycle/session of the application, such that it's hard to detect the app creation event inside HBaseTimelineWriterImpl and make it transparent the caller. Instead, app collector can know if it is the first put request for this app sent to the writer. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes
[ https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-4006: -- Assignee: Greg Senia > YARN ATS Alternate Kerberos HTTP Authentication Changes > --- > > Key: YARN-4006 > URL: https://issues.apache.org/jira/browse/YARN-4006 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, timelineserver >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2 >Reporter: Greg Senia >Assignee: Greg Senia > Fix For: 2.8.0 > > Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch > > > When attempting to use The Hadoop Alternate Authentication Classes. They do > not exactly work with what was built with > https://issues.apache.org/jira/browse/YARN-1935. > I went ahead and made the following changes to support using a Custom > AltKerberos DelegationToken custom class. > Changes to: TimelineAuthenticationFilterInitializer.class >String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE); > LOG.info("AuthType Configured: "+authType); > if (authType.equals(PseudoAuthenticationHandler.TYPE)) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > PseudoDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler"); > } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || > (UserGroupInformation.isSecurityEnabled() && > conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE))) > { > if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > authType); > LOG.info("AuthType: "+authType); > } else { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > KerberosDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler"); > } > // Resolve _HOST into bind address > String bindAddress = conf.get(HttpServer2.BIND_ADDRESS); > String principal = > filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL); > if (principal != null) { > try { > principal = SecurityUtil.getServerPrincipal(principal, bindAddress); > } catch (IOException ex) { > throw new RuntimeException( > "Could not resolve Kerberos principal name: " + ex.toString(), > ex); > } > filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL, > principal); > } > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes
[ https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650082#comment-14650082 ] Zhijie Shen commented on YARN-4006: --- Sure, take your time. I'll cancel the patch until the complete one is ready. > YARN ATS Alternate Kerberos HTTP Authentication Changes > --- > > Key: YARN-4006 > URL: https://issues.apache.org/jira/browse/YARN-4006 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, timelineserver >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2 >Reporter: Greg Senia > Fix For: 2.8.0 > > Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch > > > When attempting to use The Hadoop Alternate Authentication Classes. They do > not exactly work with what was built with > https://issues.apache.org/jira/browse/YARN-1935. > I went ahead and made the following changes to support using a Custom > AltKerberos DelegationToken custom class. > Changes to: TimelineAuthenticationFilterInitializer.class >String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE); > LOG.info("AuthType Configured: "+authType); > if (authType.equals(PseudoAuthenticationHandler.TYPE)) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > PseudoDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler"); > } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || > (UserGroupInformation.isSecurityEnabled() && > conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE))) > { > if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > authType); > LOG.info("AuthType: "+authType); > } else { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > KerberosDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler"); > } > // Resolve _HOST into bind address > String bindAddress = conf.get(HttpServer2.BIND_ADDRESS); > String principal = > filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL); > if (principal != null) { > try { > principal = SecurityUtil.getServerPrincipal(principal, bindAddress); > } catch (IOException ex) { > throw new RuntimeException( > "Could not resolve Kerberos principal name: " + ex.toString(), > ex); > } > filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL, > principal); > } > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650069#comment-14650069 ] Zhijie Shen commented on YARN-3904: --- bq. I'm not 100% sure if that's what we would like to do. Maybe we would like to decouple the offline aggregation module from our normal entity storage. Therefore, maybe it's also appealing to allow users specify if they need to create data schema in the offline aggregation process? Such as, setting one flag in the offline aggregator to create data schema? Make sense, but can we still make table creation centralized? I think we can make some option to create raw entity tables and aggregation tables separately. Thoughts? bq. After the changes in this JIRA, we will only have two types of TimelineWriters, one for FS (test only) and one for HBase. The setting on the offline storage should be independent from this setting, I assume? Yeah, I meant we currently have TIMELINE_SERVICE_READER|WRITER_CLASS pointing to a specific reader/writer implementation. However, it's better to have config such as "blah.blah.backend.type". When backend.type = hbase, we user can access HBase both directly and via Phoenix, and we allow aggregation. This may not need to part of this jira, but just think it out loudly. > Refactor timelineservice.storage to add support to online and offline > aggregation writers > - > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch, > YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, > YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, > YARN-3904-YARN-2928.006.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. In this > JIRA, I'm proposing to refactor writers to add support to aggregation > writers. Offline aggregation writers typically has less contextual > information. We can distinguish these writers by special naming. We can also > use CollectorContexts to model all contextual information and use it in our > writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4006) YARN ATS Alternate Kerberos HTTP Authentication Changes
[ https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650015#comment-14650015 ] Zhijie Shen commented on YARN-4006: --- [~gss2002], it seems that the patch doesn't come with any alt kerberos auth code. Is it an WIP patch? > YARN ATS Alternate Kerberos HTTP Authentication Changes > --- > > Key: YARN-4006 > URL: https://issues.apache.org/jira/browse/YARN-4006 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, timelineserver >Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2 >Reporter: Greg Senia > Fix For: 2.8.0 > > Attachments: YARN-4006-branch-trunk.patch, YARN-4006-branch2.6.0.patch > > > When attempting to use The Hadoop Alternate Authentication Classes. They do > not exactly work with what was built with > https://issues.apache.org/jira/browse/YARN-1935. > I went ahead and made the following changes to support using a Custom > AltKerberos DelegationToken custom class. > Changes to: TimelineAuthenticationFilterInitializer.class >String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE); > LOG.info("AuthType Configured: "+authType); > if (authType.equals(PseudoAuthenticationHandler.TYPE)) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > PseudoDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler"); > } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || > (UserGroupInformation.isSecurityEnabled() && > conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE))) > { > if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > authType); > LOG.info("AuthType: "+authType); > } else { > filterConfig.put(AuthenticationFilter.AUTH_TYPE, > KerberosDelegationTokenAuthenticationHandler.class.getName()); > LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler"); > } > // Resolve _HOST into bind address > String bindAddress = conf.get(HttpServer2.BIND_ADDRESS); > String principal = > filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL); > if (principal != null) { > try { > principal = SecurityUtil.getServerPrincipal(principal, bindAddress); > } catch (IOException ex) { > throw new RuntimeException( > "Could not resolve Kerberos principal name: " + ex.toString(), > ex); > } > filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL, > principal); > } > } > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649915#comment-14649915 ] Zhijie Shen commented on YARN-3049: --- I uploaded a new patch to address Sangjin's comments except bellow: bq. l.93: What does it mean to indicate newApp for a set of entities? What if the set of entities contains bunch of different applications? I don't worry about this, because the the put request to the app collector is related to the same app. bq. See comments above; rather than relying on the boolean flag in the arguments, can we detect the case of the application created event and do it? See my comments above. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.4.patch > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch, YARN-3049-YARN-2928.4.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649770#comment-14649770 ] Zhijie Shen commented on YARN-3049: --- [~sjlee0], yeah, I agree it's not a decent solution to let the user code to trigger writing the app to flow mapping. The reason why I did this before is that we can avoid check and put for each individual entity put request, which will obviously slow dow the write path. Detecting the application created event sounds a reasonable option. However, I'm afraid we cannot hide it inside the writer as the implementation detail, because the writer is bind to the session of an application. One solution I can think of is tackling the session start in the app collector. Upon the first put request received by the app collector, we tell the writer to also write the app to flow mapping. What do you think? > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649598#comment-14649598 ] Zhijie Shen commented on YARN-3984: --- Sure, it's fine too. One question, do we make sure every event has such a column or only the event without info has it? Personally, I prefer the former option, which makes the process of the event uniformed. > Rethink event column key issue > -- > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > > Currently, the event column key is event_id?info_key?timestamp, which is not > so friendly to fetching all the events of an entity and sorting them in a > chronologic order. IMHO, timestamp?event_id?info_key may be a better key > schema. I open this jira to continue the discussion about it which was > commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649557#comment-14649557 ] Zhijie Shen commented on YARN-3984: --- Okay, "e! eventid # inverse_event_timestamp ? eventkey" sounds a reasonable compromise. Secondly, how do we deal with an event without any info? How about creating a column "e! eventid # inverse_event_timestamp ? dummy_key : empty_value"? > Rethink event column key issue > -- > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > > Currently, the event column key is event_id?info_key?timestamp, which is not > so friendly to fetching all the events of an entity and sorting them in a > chronologic order. IMHO, timestamp?event_id?info_key may be a better key > schema. I open this jira to continue the discussion about it which was > commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648457#comment-14648457 ] Zhijie Shen commented on YARN-3904: --- [~gtCarrera9], thanks for the patch. Bellow are my comments: bq. The two failed tests passed on my local machine, and the failures appeared to be irrelevant. This said, we may still need to fix those intermittent test failures. Do we plan to fix it in this patch? Some high level comments: 1. As is also mentioned in YARN-3049, how about we refactoring reader/writer method signature in a separate jira to avoid conflicts? 2. I suggest moving the table creation stuff into TimelineSchemaCreator. 3. As HBase backend is accessed both directly and via Phoenix, it's good for us to cleanup the configuration to say we're using the HBase backend (comparing to FS backend) instead of specifically HBase or Phoenix writer/reader. Other patch details: 1. Make OfflineAggregationWriter extend Service, such that you don't need to define init. 2. Now we're working towards a production standard patch. Would you please write some javadoc to explain the schema of the aggregation tables like what we did for HBase tables. 3. The connection config should be moved to YarnConfiguration. 4. Why is info column family kept? I expect the aggregation table will only have metrics data 5. Let's also have a default PhoenixOfflineAggregationWriterImpl constructor to be used in the production code. 6. {{Class.forName(DRIVER_CLASS_NAME);}} doesn't need to be invoked every time we get a connection. > Refactor timelineservice.storage to add support to online and offline > aggregation writers > - > > Key: YARN-3904 > URL: https://issues.apache.org/jira/browse/YARN-3904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-3904-YARN-2928.001.patch, > YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, > YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch > > > After we finished the design for time-based aggregation, we can adopt our > existing Phoenix storage into the storage of the aggregated data. In this > JIRA, I'm proposing to refactor writers to add support to aggregation > writers. Offline aggregation writers typically has less contextual > information. We can distinguish these writers by special naming. We can also > use CollectorContexts to model all contextual information and use it in our > writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648218#comment-14648218 ] Zhijie Shen commented on YARN-3049: --- [~gtCarrera9], thanks for review. I've addressed most of your comments in the new patch exception followings: bq. However, I still incline to proceed the changes in this JIRA so that we can speed up consolidating our POC patches. Exactly. bq. Reader interface: use TimelineCollectorContext to package reader arguments? Yeah, I can see the rationale behind it, but maybe it's not TimelineCollectorContext. As I see a lot of arguments for the reader interface (as well as the writer one) and the potential signature change in future (e.g, adding newApp in this patch), I start to think of grouping the primitive arguments, shielding them in some category object, such as EntityContext, EntityFilters, Opts and so on, and using these as the arguments of the interface instead. Therefore, if we want to add newApp here, we don't really need to change the method signature, but add a getter/setter in Opts. Please let me know how you think about the idea. I can file another jira to deal with the issue. bq. We're now performing filters by ourselves in memory. I'm wondering if it will be more efficient to translate some of our filter specifications into HBase filters? That sounds a good idea, which should potentially improve the read performance. Let me do some investigation how to map our filter into HBase filter and push it to the backend. Given it may be a non-trivial work, can we get this patch in and follow up the filter change in another jira just in case? bq. Add a specific test in TestHBaseTimelineWriterImpl for App2FlowTable? In fact, it has been tested. I change the write path by letting newApp = true, and check if we can query the entity successfully without giving the flow/flowRun explicitly. However, I didn't do much assertion around the fields of retrieved entities, because I consider of deferring this work together with rewriting the whole HBase backend unit test. The current tests are too preliminary to capture the potential bugs around DB operations. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.3.patch > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch, > YARN-3049-YARN-2928.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648048#comment-14648048 ] Zhijie Shen commented on YARN-3942: --- Yeah, I prefer creating a TimelineEntityFileClient to modifying the current TimelineClientImp, because it should minimize the affect on existing code path. However, I'm afraid no matter which way we chose, we cannot make the change seamless to users. We cannot avoid the additional step at the client side to set app/app-attempt ID, can we? At Hive/Tez client (and other potential app client), you also have to switch the context app/app-attempt ID once the client detect a new YARN app/app-attempt is created. Therefore, if some application wants to make use of it, it will also involve code change at the user land. BTW, why do you need app-attempt ID? Is the log file on the basis of app or app-attempt? > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647217#comment-14647217 ] Zhijie Shen commented on YARN-3942: --- [~jlowe], thanks for sharing more information about limitation. It sounds a reasonable tradeoff, and only affects the cross-app queries. One concern is that the patch only contains the read path, and the writer path only exists in TEZ. Therefore, it's not a complete solution from the perspective of YARN alone. Is it possible to generalize the write path in TEZ and promote it to YARN? > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3814: -- Attachment: YARN-3814.reference.patch I attached the patch which contains the web services only from the POC uber patch for your reference. The reason why I propose to have cluster ID in the path is to make it more like a *REST* API, such that there's a hierarchical path from cluster to entityId. The reason why I only choose clusterId, appId, entityType and entityId on the path is that we said these are the 4 pieces can uniquely identify an entity in taxonomy (at least for now). I'm not too worry about the default cluster ID problem. The user can read it from yarn configuration. When we create the client lib to wrap over the REST API, we can load the default from there if the user doesn't supply the clusterId. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch, > YARN-3814-YARN-2928.02.patch, YARN-3814.reference.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646935#comment-14646935 ] Zhijie Shen commented on YARN-3984: --- In fact, metric has the same problem, but it may be still okay to ignore a metric without any data. > Rethink event column key issue > -- > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > > Currently, the event column key is event_id?info_key?timestamp, which is not > so friendly to fetching all the events of an entity and sorting them in a > chronologic order. IMHO, timestamp?event_id?info_key may be a better key > schema. I open this jira to continue the discussion about it which was > commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646924#comment-14646924 ] Zhijie Shen commented on YARN-3984: --- [~vrushalic], thanks for picking it up. The aforementioned cases are definitely good to support, while the current query we want to support now (in YARN-3051 and YARN-3049) is to retrieve all events belonging to an entity (e.g. application, attempt, container and etc.). With this basic query, we can easily distill the details that happen to the entity, such as the diagnostic msg of the kill event. In this case, the most efficient way is to put timestamp even before the event ID, so that we don't need to order the events in memory. In addition to the key composition, I find another significant problem with the event store schema. If the event doesn't contain any info, it will be ignored then. And we cannot always guarantee user will put something into info. For example, user may define a KILL event without any diagnostic msg. > Rethink event column key issue > -- > > Key: YARN-3984 > URL: https://issues.apache.org/jira/browse/YARN-3984 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > > Currently, the event column key is event_id?info_key?timestamp, which is not > so friendly to fetching all the events of an entity and sorting them in a > chronologic order. IMHO, timestamp?event_id?info_key may be a better key > schema. I open this jira to continue the discussion about it which was > commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
[ https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3993: -- Labels: newbie (was: ) > Change to use the AM flag in ContainerContext determine AM container > > > Key: YARN-3993 > URL: https://issues.apache.org/jira/browse/YARN-3993 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen > Labels: newbie > > After YARN-3116, we will have a flag in ContainerContext to determine if the > container is AM or not in aux service. We need to change accordingly to make > use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
[ https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646558#comment-14646558 ] Zhijie Shen edited comment on YARN-3993 at 7/29/15 6:26 PM: [~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we already build the channel to propagate the AM flag to aux service. What we need to do here is simply update the way that PerNodeTimelineCollectorsAuxService determine if the container is AM or not. Feel free to pick it up if you want to ramp up with TS v2. was (Author: zjshen): [~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we already build the channel to propagate the AM flag to aux service. What we need to do here is simply update the way that PerNodeTimelineCollectorsAuxService determine if the container is AM or not. > Change to use the AM flag in ContainerContext determine AM container > > > Key: YARN-3993 > URL: https://issues.apache.org/jira/browse/YARN-3993 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen > Labels: newbie > > After YARN-3116, we will have a flag in ContainerContext to determine if the > container is AM or not in aux service. We need to change accordingly to make > use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
[ https://issues.apache.org/jira/browse/YARN-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646558#comment-14646558 ] Zhijie Shen commented on YARN-3993: --- [~sunilg], thanks for your interest. It's not related to RM. In YARN-3116, we already build the channel to propagate the AM flag to aux service. What we need to do here is simply update the way that PerNodeTimelineCollectorsAuxService determine if the container is AM or not. > Change to use the AM flag in ContainerContext determine AM container > > > Key: YARN-3993 > URL: https://issues.apache.org/jira/browse/YARN-3993 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen > > After YARN-3116, we will have a flag in ContainerContext to determine if the > container is AM or not in aux service. We need to change accordingly to make > use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
Zhijie Shen created YARN-3993: - Summary: Change to use the AM flag in ContainerContext determine AM container Key: YARN-3993 URL: https://issues.apache.org/jira/browse/YARN-3993 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen After YARN-3116, we will have a flag in ContainerContext to determine if the container is AM or not in aux service. We need to change accordingly to make use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
[ https://issues.apache.org/jira/browse/YARN-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646443#comment-14646443 ] Zhijie Shen commented on YARN-3992: --- The problem was found with jenkins build on YARN-3049: https://builds.apache.org/job/PreCommit-YARN-Build/8701/testReport/ > TestApplicationPriority.testApplicationPriorityAllocation fails intermittently > -- > > Key: YARN-3992 > URL: https://issues.apache.org/jira/browse/YARN-3992 > Project: Hadoop YARN > Issue Type: Test >Reporter: Zhijie Shen > > {code} > java.lang.AssertionError: expected:<7> but was:<5> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
Zhijie Shen created YARN-3992: - Summary: TestApplicationPriority.testApplicationPriorityAllocation fails intermittently Key: YARN-3992 URL: https://issues.apache.org/jira/browse/YARN-3992 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen {code} java.lang.AssertionError: expected:<7> but was:<5> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646339#comment-14646339 ] Zhijie Shen commented on YARN-3049: --- TestApplicationPriority.testApplicationPriorityAllocation seems to have a race condition issue. I cannot reproduce it locally both on trunk or with on YARN-2928 with this patch. Anyway, it seems not to be related to this jira. Will file a separate Jira to track the test failure. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-YARN-2928.2.patch After YARN-3908, I updated the patch according to the HBase write fixes. I've decoupled the wireup of rest APIs and worked towards a review ready HBase implementation patch. This patch will still include the implementation of writing and reading app2flow table, because without it, the reader may not work properly. Please let me know if you want to split it into two patch. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch, YARN-3049-YARN-2928.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643790#comment-14643790 ] Zhijie Shen commented on YARN-3942: --- Thanks for this work. Agree it's a good interim step between v1 and v2. I have a first scan of this patch, and am fine with the idea overall. As far as I can tell, the unsupported case is to get entities of the same type across applications. Other than that, the HDFS data path seems to work fine. [~jlowe], if you'd like to elaborate the drawback a bit, it will be helpful. Will continue to review the patch, and post more detailed comments. > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3908. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: YARN-2928 Committed the patch to branch YARN-2928. Thanks for the patch, Vrushali and Sangjin, as well as other folks for contributing your thoughts. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Fix For: YARN-2928 > > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3984) Rethink event column key issue
Zhijie Shen created YARN-3984: - Summary: Rethink event column key issue Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643476#comment-14643476 ] Zhijie Shen commented on YARN-3908: --- Sure, as most folks are comfortable with the latest patch, let's get this in. I'll file a separate jira to track the discussion about event column key. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643267#comment-14643267 ] Zhijie Shen commented on YARN-3908: --- Okay, it's fair point. It seems that the key design significantly depends on how we want to operate on the events. The current key design is most friendly to check if there exists the events who match the given event ID to match some given info key (and its value). But if you want to fetch everything that belongs to this event (our query needs to do this, as it's implicitly an atomic unit for now), it seems to be inevitable to scan through all these columns that have the given event ID (correct me if I'm wrong :-). If so, there seems to to have little gain from this key design, while complicating the event encapsulation logic. And after rethinking of the current query to support (YARN-3051), I want to amend my suggestion. It seems to be more reasonable to use {{e!eventTimestamp?eventId?eventInfoKey}}, such that we can natively scan through the events of one entity one-by-one return them in a chronological order. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643113#comment-14643113 ] Zhijie Shen commented on YARN-3908: --- [~vrushalic], thanks for fixing the problem. W.R.T the column key, shall we use: {code} e!eventId?eventTimestamp?eventInfoKey : eventInfoValue {code} Image we have two KILL events: one on TS1 and the other on TS2. IMHO, we want to scan through the two events' columns one-by-one instead of in a interleaved manner. This will make reader to parse multiple events much easier and encapsulate them one after the other. It will be more useful in the future if we want to just retrieve part of the events of a big job (e.g. within a given time window or the most recent events). Thoughts? > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3981) support timeline clients not associated with an application
[ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643056#comment-14643056 ] Zhijie Shen commented on YARN-3981: --- Thanks for filing the jira. I'm going to pick this up. > support timeline clients not associated with an application > --- > > Key: YARN-3981 > URL: https://issues.apache.org/jira/browse/YARN-3981 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee > > In the current v.2 design, all timeline writes must belong in a > flow/application context (cluster + user + flow + flow run + application). > But there are use cases that require writing data outside the context of an > application. One such example is a higher level client (e.g. tez client or > hive/oozie/cascading client) writing flow-level data that spans multiple > applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3981) support timeline clients not associated with an application
[ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3981: - Assignee: Zhijie Shen > support timeline clients not associated with an application > --- > > Key: YARN-3981 > URL: https://issues.apache.org/jira/browse/YARN-3981 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > In the current v.2 design, all timeline writes must belong in a > flow/application context (cluster + user + flow + flow run + application). > But there are use cases that require writing data outside the context of an > application. One such example is a higher level client (e.g. tez client or > hive/oozie/cascading client) writing flow-level data that spans multiple > applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes
[ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640910#comment-14640910 ] Zhijie Shen commented on YARN-3949: --- IMHO, given write + flush, it's not necessary to have sync write and async write api at the writer level, while we already have the analogy at the app collector level. App collector level knows better than writer to decide if it should flush after one write, two writes or more. The current approach seems to be good now, I propose to go with it, and unblock viewing the app timeline data after it gets finished. Thoughts? From my point of view, async write is more than the flush (e.g, queueing the entities in the collector, combining the updates of the same entity and etc.) For the patch details: 1. "writer.flush.interval.seconds" \-> "writer.flush-interval-seconds". YARN convention is to use "." to separate namespaces (sub components) and "-" to concat words. Please move the default to YarnConfiguration as well, which is part of API, and ad this config to yarn-default.xml. 2. Shall we use shutdown and then waitTermination to gracefully stop the service? Otherwise, if there's a scheduled flush task that is running while the manager invokes writer.close(), will it cause any problem? Or is it just thread safe, such that we don't need to worry about it? > ensure timely flush of timeline writes > -- > > Key: YARN-3949 > URL: https://issues.apache.org/jira/browse/YARN-3949 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3949-YARN-2928.001.patch, > YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch > > > Currently flushing of timeline writes is not really handled. For example, > {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch > and write puts asynchronously. However, {{BufferedMutator}} may not flush > them to HBase unless the internal buffer fills up. > We do need a flush functionality first to ensure that data are written in a > reasonably timely manner, and to be able to ensure some critical writes are > done synchronously (e.g. key lifecycle events). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes
[ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638190#comment-14638190 ] Zhijie Shen commented on YARN-3949: --- The proposal looks good to me for now. We may need to revisit it if we'd like to support getting the real-time data later. One question about the buffer: if for some reason the app collector has crashed, will this written, but unflushed data be lost? > ensure timely flush of timeline writes > -- > > Key: YARN-3949 > URL: https://issues.apache.org/jira/browse/YARN-3949 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3949-YARN-2928.001.patch > > > Currently flushing of timeline writes is not really handled. For example, > {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch > and write puts asynchronously. However, {{BufferedMutator}} may not flush > them to HBase unless the internal buffer fills up. > We do need a flush functionality first to ensure that data are written in a > reasonably timely manner, and to be able to ensure some critical writes are > done synchronously (e.g. key lifecycle events). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637300#comment-14637300 ] Zhijie Shen commented on YARN-3908: --- bq. Is it the event id + timestamp? How about the event type? If you look at the equals() and the hashCode() implementations of TimelineEvent, it uses the timestamp, the event type, and even the info as a whole, but the id is not used for equality. How does that square with the stated intent that the event id and the timestamp form the identity? There's no event type now. In v1, it's called type, but in v2 is renamed to id. We want to use id + ts to identify an event object uniquely to support the case that an event happens multiple times. And we can avoid the combination ID like "container_allocation_13421543243". Does this make sense? bq. Is pretty much the only access pattern "give me all the events that belong to this entity"? Yeah, get the events in chronological order of one entity, or just getting part of them via filtering. bq. Two TimelineEvents are equal only if the timestamp is equal AND the type is equal AND the entire info maps are equal. What would we query by event type, timestamp and event info key? Do users always have to specify the timestamp? There's no type, but only ID. In the current reader API, we cannot do sub-entity filtering, but in the future, we can try to support , for example, getting the events in a given time window. If two event has the same , but different info, we may consider them as the same event, but carry different information. The latter put one will append more k/v pairs or update the existing ones. bq. Do we need to store only the latest event for each timestamp, or all of them? It would almost sound like the key should be type and timestamp, but what about the entire event info map? In DB, i think proper logic is: if we put and , we should have two separate records persisted; and if we put and again, we should update the same record and let k1=v1'. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634352#comment-14634352 ] Zhijie Shen commented on YARN-3049: --- [~sjlee0], yeah, for POC purpose, I temporally do flush upon each put. I suspect it will significantly impact the write performance. We may need to sync on this issue > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634200#comment-14634200 ] Zhijie Shen commented on YARN-3908: --- I set up hbase-1.0.1.1 as a single node cluster on local FS, submit an MR job, after job got finished, I used the REST API (YARN-3049) to read the entity -> NOT FOUND and I used hbase shell to scan through the entity table -> NOT FOUND as well. We may want to rethink of the buffer policy. It seems not to be a good user experience that after app is finished, the entity is still not available to users. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-WIP.3.patch Upload a new WIP patch with some bug fixes, including the the two mentioned in YARN-3908. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch, > YARN-3049-WIP.3.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633857#comment-14633857 ] Zhijie Shen commented on YARN-3908: --- I found two more issues upon debugging the reader POC: 1. The events have been written into metrics column family. 2. The entity is not accessible immediately after a single put operation. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, > YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, > YARN-3908-YARN-2928.005.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-WIP.2.patch [~sjlee0] and [~gtCarrera9], thanks for review the patch. I'm currently targeting an E2E reader POC, and I'll try to address your comments a bit later. I upload a new WIP patch, which basically makes the reader work E2E, while their are couple of bugs. I'll spend some more time to fix them. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch, YARN-3049-WIP.2.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628708#comment-14628708 ] Zhijie Shen commented on YARN-3814: --- I didn't go beyond the current reader interface. You're safe:-) > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628647#comment-14628647 ] Zhijie Shen commented on YARN-3814: --- [~varun_saxena], thanks for putting the patch. It seems that we have duplicate some work (I'm working on a POC for reader (YARN-3049) which contains some REST API hook too). I'll upload a POC patch a bit latter. Let's consolidate them. > REST API implementation for getting raw entities in TimelineReader > -- > > Key: YARN-3814 > URL: https://issues.apache.org/jira/browse/YARN-3814 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-3814-YARN-2928.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625663#comment-14625663 ] Zhijie Shen commented on YARN-3116: --- Congrats on your first patch, [~giovanni.fumarola]! > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Fix For: 2.8.0 > > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625397#comment-14625397 ] Zhijie Shen commented on YARN-3908: --- Yeah, but the method based on metric value number is not guaranteed, are we okay with it? > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625369#comment-14625369 ] Zhijie Shen commented on YARN-3908: --- [~vrushalic] and [~sjlee0], thanks for helping fix the problems. I've two questions: 1. In fact, I'm wondering if we should but info and events into a separate column family like what we did for configs/metrics? 2. We don't want to store the metric type, do we? > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > Attachments: YARN-3908-YARN-2928.001.patch, > YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch > > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-WIP.1.patch Attache a WIP patch so that the community can take a look while I still need to add the app->flow mapping and some missing fields. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > Attachments: YARN-3049-WIP.1.patch > > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623176#comment-14623176 ] Zhijie Shen commented on YARN-3116: --- +1 for the last patch. Will commit it after jenkins comments. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v10.patch, > YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, > YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, > YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3914) Entity created time should be part of the row key of entity table
[ https://issues.apache.org/jira/browse/YARN-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623170#comment-14623170 ] Zhijie Shen commented on YARN-3914: --- This will not block the implementation of getEntities (YARN-3049), but the performance will be bad without it, especially when the number of entities per type per app becomes huge, i.e., there's a big job. > Entity created time should be part of the row key of entity table > - > > Key: YARN-3914 > URL: https://issues.apache.org/jira/browse/YARN-3914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > Entity created time should be part of the row key of entity table, between > entity type and entity Id. The reason to have it is to index the entities. > Though we cannot index the entities for all kinds of information, indexing > them according to the created time is very necessary. Without it, every query > for the latest entities that belong to an application and a type will scan > through all the entities that belong to them. For example, if we want to list > the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3914) Entity created time should be part of the row key of entity table
Zhijie Shen created YARN-3914: - Summary: Entity created time should be part of the row key of entity table Key: YARN-3914 URL: https://issues.apache.org/jira/browse/YARN-3914 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Entity created time should be part of the row key of entity table, between entity type and entity Id. The reason to have it is to index the entities. Though we cannot index the entities for all kinds of information, indexing them according to the created time is very necessary. Without it, every query for the latest entities that belong to an application and a type will scan through all the entities that belong to them. For example, if we want to list the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622998#comment-14622998 ] Zhijie Shen commented on YARN-3116: --- one nit: can we move ContainerType to server/api? > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622859#comment-14622859 ] Zhijie Shen commented on YARN-3116: --- Sure, I'll review the latest patch this afternoon. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621592#comment-14621592 ] Zhijie Shen commented on YARN-3908: --- 1. TimelineEvent has a timestamp associated with it. It tells us when the event happened. We should have this information persisted, but unfortunately it seems not. 2. Metric doesn't have a timestamp because the timestamp is associated with each individual value. 3. I also realized that the metric type is not persisted too. Now I just assume if size(metric) > 1 => time series, else => single value in reader implementation. But it may not be guaranteed. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621581#comment-14621581 ] Zhijie Shen commented on YARN-3116: --- [~kkaranasos], I didn't touch the detail on YARN-2884, but it seems to be the API change that needs to be exposed to the users. In this case, user faced objects, i.e., ContainerLaunchContext, is the better choice for you. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621545#comment-14621545 ] Zhijie Shen commented on YARN-3836: --- +1 LGTM > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch, > YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch, > YARN-3836-YARN-2928.004.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621480#comment-14621480 ] Zhijie Shen edited comment on YARN-3908 at 7/10/15 12:23 AM: - It's blocking the reader interface implementation now. Assign it to [~vrushalic] by default. Please feel free to rebalance the workload. was (Author: zjshen): Assign it to [~vrushalic] by default. Please feel free to rebalance the workload. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621480#comment-14621480 ] Zhijie Shen commented on YARN-3908: --- Assign it to [~vrushalic] by default. Please feel free to rebalance the workload. > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3908: -- Issue Type: Sub-task (was: Bug) Parent: YARN-2928 > Bugs in HBaseTimelineWriterImpl > --- > > Key: YARN-3908 > URL: https://issues.apache.org/jira/browse/YARN-3908 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Vrushali C > > 1. In HBaseTimelineWriterImpl, the info column family contains the basic > fields of a timeline entity plus events. However, entity#info map is not > stored at all. > 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3908) Bugs in HBaseTimelineWriterImpl
Zhijie Shen created YARN-3908: - Summary: Bugs in HBaseTimelineWriterImpl Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621209#comment-14621209 ] Zhijie Shen commented on YARN-3836: --- bq. l.550: It sounds like now the type takes precedence over the created time in the sort order in this version. Is this intended? If not (timestamp is supposed to be first), it might be a good idea to have Identifier implement Comparable as well and use that in TimelineEntity.compareTo(). Currently getEntities supports only return the entities of a single entity type, such that the ordering among them won't be affected by the entity type. In general, it's seem to be more natural to put entities of the same type close to each other. For example, we can merge to the collection of entities returned from multiple getEntities queries to imitate fetching entities of multiple entity types. In case that we have the specific use case (e.g., we want to order entities globally across type), it should be fine and not expensive to define a customized comparator to do it. > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch, > YARN-3836-YARN-2928.002.patch, YARN-3836-YARN-2928.003.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621051#comment-14621051 ] Zhijie Shen commented on YARN-3901: --- Yeah, I have dependency on this table for reader. If nobody is working on this table, I can take care of it. > Populate flow run data in the flow_run table > > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621034#comment-14621034 ] Zhijie Shen commented on YARN-3116: --- [~kkaranasos], thanks for notifying us of YARN-2882. I took a quick look at the jira. Our approach seems to be similar, but it seems that we're on parallel tracks. While YARN-2882 defines two container type for container related API so as to differ the container request to RM or NM, what we want to label a container here aims to let NM know if the container hosts AM or not. This is completely internal information, and users are blind to this type and also not able to set/change it. And this is why we propose to pass this information via ContainerTokenIndentifier instead of ContainerLaunchContext. Thoughts? > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619759#comment-14619759 ] Zhijie Shen commented on YARN-3901: --- [~vrushalic], just want to confirm with you that the jira won't cover app_flow table, right? I need to flow mapping for implementing the reader apis against HBase backend. If it's not covered here, I can help to implement it in the scope of YARN-3049. > Populate flow run data in the flow_run table > > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model
[ https://issues.apache.org/jira/browse/YARN-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619729#comment-14619729 ] Zhijie Shen commented on YARN-3836: --- bq. I see that we're implementing the Comparable interface for all 3 types. I'm wondering if it makes sense for them. What would it mean to order TimelineEntity instances? Does it mean much? Where would it be useful? Do we need to implement it? The same questions go for the other 2 types... For example, compareTo of TimelineEntity is used to order the entities in the return set of getEntities query. It would be better to return the entities ordered by timestamp instead of randomly. bq. his is an open question. Is the id alone the identity or does the timestamp together form the identity? Do we expect users of TimelineEvent always be able to provide the timestamp? Honestly I'm not 100% sure what the contract is, and we probably want to make it explicit (and add it to the javadoc). Thoughts? In ATS v1, we actually use id + timestamp to uniquely identify an event. On merit of doing this is to let the app to put the same event multiple times. For example, a job can request resource many times. Every time it can put a RESOURCE_REQUEST event with a unique timestamp and fill in the resource information. > add equals and hashCode to TimelineEntity and other classes in the data model > - > > Key: YARN-3836 > URL: https://issues.apache.org/jira/browse/YARN-3836 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Li Lu > Attachments: YARN-3836-YARN-2928.001.patch > > > Classes in the data model API (e.g. {{TimelineEntity}}, > {{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or > {{hashCode()}}. This can cause problems when these objects are used in a > collection such as a {{HashSet}}. We should implement these methods wherever > appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3116: -- Attachment: YARN-3116.v8.patch Fixed TestAppRunnability as well in the new patch. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, > YARN-3116.v7.patch, YARN-3116.v8.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619564#comment-14619564 ] Zhijie Shen commented on YARN-3116: --- Xuan, thanks for your comment. I think this is a good point. To be forward compatible, it's better to use the enum here instead of the boolean flag. In this case, we can add more enum, such as SystemContainer and so on in the future without adding new flag and breaking the compatibility. [~giovanni.fumarola], [~subru], how do you think? > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619518#comment-14619518 ] Zhijie Shen commented on YARN-3116: --- Is TestAppRunnability failure related to this patch? The normal practice is to check if the test failure is related to the code change in this jira. If not, you can go ahead to fix a separate jira to tackling it. Thanks for fixing TestPrivilegedOperationExecutor. It seems to be straightforward. So let's keep it here. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619440#comment-14619440 ] Zhijie Shen commented on YARN-3047: --- Thanks for kicking another jenkins build. IAC, the patch looks good to me. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047-YARN-2928.13.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619347#comment-14619347 ] Zhijie Shen commented on YARN-3049: --- Updated the title accordingly to describe the scope of this jira more accurately. > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Summary: [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend (was: [Storage Implementation] Implement the storage reader interface to fetch raw data) > [Storage Implementation] Implement storage reader interface to fetch raw data > from HBase backend > > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement the storage reader interface to fetch raw data
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Summary: [Storage Implementation] Implement the storage reader interface to fetch raw data (was: [Compatiblity] Implement existing ATS queries in the new ATS design) > [Storage Implementation] Implement the storage reader interface to fetch raw > data > - > > Key: YARN-3049 > URL: https://issues.apache.org/jira/browse/YARN-3049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Zhijie Shen > > Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3116: -- Attachment: YARN-3116.v6.patch Fixed the test failure in the new patch. Otherwise, the previous patch looks good to me. As I'm touching the patch also, I need a second committer to take a look. [~jianhe], would you mind doing me a favor? > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617553#comment-14617553 ] Zhijie Shen commented on YARN-3047: --- It looks good to me overall, exception the config. Please let me know if I've missed something: the new configuration name is and the old configuration default value are used together. Why do we want the combination? {code} 276 if (YarnConfiguration.useHttps(conf)) { 277 return conf.get(YarnConfiguration.TIMELINE_READER_WEBAPP_HTTPS_ADDRESS, 278 YarnConfiguration.DEFAULT_TIMELINE_SERVICE_WEBAPP_HTTPS_ADDRESS); 279 } else { 280 return conf.get(YarnConfiguration.TIMELINE_READER_WEBAPP_ADDRESS, 281 YarnConfiguration.DEFAULT_TIMELINE_SERVICE_WEBAPP_ADDRESS); 282 } {code} Can't we just reuse the existing config "timeline_service_webapp" instead of creating a new? In fact, user is bind of writer. They just know timeline_service_webapp is where they can access the data. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047.001.patch, YARN-3047.003.patch, > YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, > YARN-3047.02.patch, YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617307#comment-14617307 ] Zhijie Shen commented on YARN-3047: --- Would you please hold a while? I plan to take a look this afternoon. > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047-YARN-2928.12.patch, YARN-3047.001.patch, YARN-3047.003.patch, > YARN-3047.005.patch, YARN-3047.006.patch, YARN-3047.007.patch, > YARN-3047.02.patch, YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616031#comment-14616031 ] Zhijie Shen commented on YARN-3047: --- YARN-3051 has been committed. Would you please update the jira? > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3047: -- Labels: (was: BB2015-05-TBR) > [Data Serving] Set up ATS reader with basic request serving structure and > lifecycle > --- > > Key: YARN-3047 > URL: https://issues.apache.org/jira/browse/YARN-3047 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: Timeline_Reader(draft).pdf, > YARN-3047-YARN-2928.08.patch, YARN-3047-YARN-2928.09.patch, > YARN-3047-YARN-2928.10.patch, YARN-3047-YARN-2928.11.patch, > YARN-3047.001.patch, YARN-3047.003.patch, YARN-3047.005.patch, > YARN-3047.006.patch, YARN-3047.007.patch, YARN-3047.02.patch, > YARN-3047.04.patch > > > Per design in YARN-2938, set up the ATS reader as a service and implement the > basic structure as a service. It includes lifecycle management, request > serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615930#comment-14615930 ] Zhijie Shen commented on YARN-3051: --- Will commit the patch late today if no more comments. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, > YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, > YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, > YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, > YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, > YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615839#comment-14615839 ] Zhijie Shen commented on YARN-3051: --- Okay, then it seems to be fine. I didn't notice it's per cluster based mapping file. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, > YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, > YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, > YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, > YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, > YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615792#comment-14615792 ] Zhijie Shen commented on YARN-3051: --- bq. The current FS implementation had cluster as part of the path. So there will a app_flow_mapping.csv for each cluster. So in a way it is part of the primary key even though its not there in app_flow_mapping.csv I hope that is what your concern was. The problem is about write path. Suppose we unfortunately have the duplicate appId: one is clusterId1/appId and the other is clusterId2/appId. When the former entity is written, you have added appId into the mapping file. How do you write the mapping file upon cluster2/appId? Overwriting the row of appId? Appending one more row of appId? Both will trouble you when finding the right flow info when the query has default values. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, > YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, > YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, > YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, > YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, > YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615747#comment-14615747 ] Zhijie Shen commented on YARN-3051: --- Hi Varun, thanks for updating the patch. I have only one remaining issue about this patch: According to https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf. It seems that we have chosen clusterId + appId to globally find a unique flow run. I think here we should do it similar by adding clusterId, which 's mandatory field. /cc [~sjlee0]. Some other improvement that are required in the future to improve robustness and performance. Let's make sure we have a jira to improve the reader later. 1. Maybe we want to cache the mapping instead of reading it from the file for every query. 2. limit should be push down into the for loop. It's unnecessary that if we want to just retrieve 10 entities, we will have to go through 1000 qualified candidates and finally pick the top 10. 3. We'd better avoid hard code "/" as the path separator, and we should use FileSystem interface to operate the files, such that the impl can also work with HDFS. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, > YARN-3051-YARN-2928.07.patch, YARN-3051-YARN-2928.08.patch, > YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, > YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, > YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, > YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615387#comment-14615387 ] Zhijie Shen commented on YARN-3051: --- How about we using common csv lib to handle the lookup file? http://commons.apache.org/proper/commons-csv/index.html > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, > YARN-3051-YARN-2928.07.patch, YARN-3051.Reader_API.patch, > YARN-3051.Reader_API_1.patch, YARN-3051.Reader_API_2.patch, > YARN-3051.Reader_API_3.patch, YARN-3051.Reader_API_4.patch, > YARN-3051.wip.02.YARN-2928.patch, YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3881) Writing RM cluster-level metrics
[ https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612638#comment-14612638 ] Zhijie Shen commented on YARN-3881: --- Once the metrics are ready, we can build YARN/timeline service builtin webUI to show this information, as well as expose it via API, such that third party monitoring like ambari can integrate with it. I think it should be quite flexible. > Writing RM cluster-level metrics > > > Key: YARN-3881 > URL: https://issues.apache.org/jira/browse/YARN-3881 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: metrics.json > > > RM has a bunch of metrics that we may want to write into the timeline backend > to. I attached the metrics.json that I've crawled via > {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to > three groups of metrics: > 1. QueueMetrics > 2. JvmMetrics > 3. ClusterMetrics > The problem is that unlike other metrics belongs to a single application, > these ones belongs to RM or cluster-wide. Therefore, current write path is > not going to work for these metrics because they don't have the associated > user/flow/app context info. We need to rethink of modeling cross-app metrics > and the api to handle them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3881) Writing RM cluster-level metrics
[ https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612429#comment-14612429 ] Zhijie Shen commented on YARN-3881: --- IMHO, we need to add an addition API to direct write the cross app metrics (or already aggregated metrics, if you think of these ones are actually the aggregated data of each individual app, such as the counters of submitted/pending/running apps) to the backend, in the separate tables, such as cluster/queue/user tables, and these data don't need to be aggregated any more. > Writing RM cluster-level metrics > > > Key: YARN-3881 > URL: https://issues.apache.org/jira/browse/YARN-3881 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: metrics.json > > > RM has a bunch of metrics that we may want to write into the timeline backend > to. I attached the metrics.json that I've crawled via > {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to > three groups of metrics: > 1. QueueMetrics > 2. JvmMetrics > 3. ClusterMetrics > The problem is that unlike other metrics belongs to a single application, > these ones belongs to RM or cluster-wide. Therefore, current write path is > not going to work for these metrics because they don't have the associated > user/flow/app context info. We need to rethink of modeling cross-app metrics > and the api to handle them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3881) Writing RM cluster-level metrics
[ https://issues.apache.org/jira/browse/YARN-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3881: -- Attachment: metrics.json > Writing RM cluster-level metrics > > > Key: YARN-3881 > URL: https://issues.apache.org/jira/browse/YARN-3881 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: metrics.json > > > RM has a bunch of metrics that we may want to write into the timeline backend > to. I attached the metrics.json that I've crawled via > {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to > three groups of metrics: > 1. QueueMetrics > 2. JvmMetrics > 3. ClusterMetrics > The problem is that unlike other metrics belongs to a single application, > these ones belongs to RM or cluster-wide. Therefore, current write path is > not going to work for these metrics because they don't have the associated > user/flow/app context info. We need to rethink of modeling cross-app metrics > and the api to handle them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3881) Writing RM cluster-level metrics
Zhijie Shen created YARN-3881: - Summary: Writing RM cluster-level metrics Key: YARN-3881 URL: https://issues.apache.org/jira/browse/YARN-3881 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen RM has a bunch of metrics that we may want to write into the timeline backend to. I attached the metrics.json that I've crawled via {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to three groups of metrics: 1. QueueMetrics 2. JvmMetrics 3. ClusterMetrics The problem is that unlike other metrics belongs to a single application, these ones belongs to RM or cluster-wide. Therefore, current write path is not going to work for these metrics because they don't have the associated user/flow/app context info. We need to rethink of modeling cross-app metrics and the api to handle them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3880) Writing more RM side app-level metrics
Zhijie Shen created YARN-3880: - Summary: Writing more RM side app-level metrics Key: YARN-3880 URL: https://issues.apache.org/jira/browse/YARN-3880 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3044, we implemented an analog of metrics publisher for ATS v1. While it helps to write app/attempt/container life cycle events, it really doesn't write as many app-level system metrics that RM are now having. Just list the metrics that I found missing: * runningContainers * memorySeconds * vcoreSeconds * preemptedResourceMB * preemptedResourceVCores * numNonAMContainerPreempted * numAMContainerPreempted Please feel fee to add more into the list if you find it's not covered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612113#comment-14612113 ] Zhijie Shen commented on YARN-3051: --- 2. I meant we store in a CSV file. Thoughts? 3. I think FS impl related config shouldn't be put in api as the impl not supposed to be used by public, but for test purpose. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051-YARN-2928.003.patch, > YARN-3051-YARN-2928.03.patch, YARN-3051-YARN-2928.04.patch, > YARN-3051-YARN-2928.05.patch, YARN-3051-YARN-2928.06.patch, > YARN-3051.Reader_API.patch, YARN-3051.Reader_API_1.patch, > YARN-3051.Reader_API_2.patch, YARN-3051.Reader_API_3.patch, > YARN-3051.Reader_API_4.patch, YARN-3051.wip.02.YARN-2928.patch, > YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611445#comment-14611445 ] Zhijie Shen commented on YARN-3116: --- 1. I think normal execution won't have null attempt, but the tests have omitted it. You probably want to fix the test code instead, such as mock the currentAttempt and fix Application#submmit to add the attempt to rmapp. > [Collector wireup] We need an assured way to determine if a container is an > AM container on NM > -- > > Key: YARN-3116 > URL: https://issues.apache.org/jira/browse/YARN-3116 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, timelineserver >Reporter: Zhijie Shen >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-3116.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, > YARN-3116.v4.patch > > > In YARN-3030, to start the per-app aggregator only for a started AM > container, we need to determine if the container is an AM container or not > from the context in NM (we can do it on RM). This information is missing, > such that we worked around to considered the container with ID "_01" as > the AM container. Unfortunately, this is neither necessary or sufficient > condition. We need to have a way to determine if a container is an AM > container on NM. We can add flag to the container object or create an API to > do the judgement. Perhaps the distributed AM information may also be useful > to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)