[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4074: --- Fix Version/s: 2.9.0 > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Fix For: 2.9.0 > > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.008.patch v.8 patch posted. Fixed the checkstyle and findbugs issues. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.007.patch v.7 patch posted. This is now based on the YARN-2928 branch now that YARN-3901 has been resolved. Other than that, there are no real changes from the previous v.6 patch. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, > YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, > YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.006.patch v.6 POC patch posted. Renamed {{TimelineEntityReader.createTable()}} to {{TimelineEntityReader.getTable()}}. Reusing the same instance for a given table. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.005.patch The POC v.5 patch posted. It mostly rebases with the v.8 patch for YARN-3901. It should apply cleanly on top of the v.8 patch for YARN-3901. Again, your comments are greatly appreciated. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.004.patch The v.4 POC patch posted. - added the XmlElement notation for flow runs in the flow activity entity - rebased against the v.5 patch for YARN-3901 - added more unit tests - made sure the id's are set correctly on flow run entities and flow activity entities > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.003.patch POC v.3 patch posted. Key changes include - switched from Get.setMaxResultSize() to PageFilter (more on that below) - major refactoring of HBaseTimelineReaderImpl -- introduced TimelineEntityReader and the hierarchy of classes to isolate proper reading per type - added unit tests to test HBaseTimelineReaderImpl for flow activity and flow runs - fixed an issue with FlowScanner where the cells were returned in the wrong order so it was breaking Column.readResult() - made *RowKey classes real object classes, and added the parseRowKey method that returns an instance of the RowKey - fixed the order of the add and pollLast - renamed FlowEntity to FlowRunEntity - added the compareTo() method for FlowActivityEntity - passed the type into the FlowActivityEntity constructor - set configs for FlowActivityEntity and FlowRunEntity to null - improved the way we get string values from info for FlowActivityEntity and FlowRunEntity - added getNumberOfRuns() to FlowActivityEntity It is actually pretty close to being ready, but since YARN-3901 is still outstanding, I'm not making it an official patch yet. As for the PageFilter issue, I concluded setMaxResultSize() is not the right API to use to limit the number of rows. I believe the PageFilter is the right thing to use. I also added the counting logic to get the right number of records even if the result iterator advances. As for the FlowScanner issue mentioned above, [~vrushalic] and [~jrottinghuis] debugged this to track down a bug in YARN-3901. As such, this change will likely be made in the final YARN-3901 patch. I just included it here for completeness and to make the unit code pass. You should be able to apply the YARN-3901 v.3 patch and then this patch cleanly. Let me know if you have any questions. I'd greatly appreciate review feedback. I understand it's a lot of code... > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.002.patch Posting a v.2 POC patch. This adds the flow run query. As for [~djp]'s comments, yes, I agree that the reader code needs more serious refactoring, both in the API as well as the implementation. I believe [~varun_saxena]'s looking into cleaning up the filters, and so on in YARN-3863. So improving the API would be taken up by Varun. Varun? I'd also like to refactor the implementation more to restructure it. This POC patch is by no means an indication of the final form of this patch. I just wanted to get it out there so we can ensure it is correct and discuss the approach taken here. I hope that clarifies things a bit. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4074: -- Attachment: YARN-4074-YARN-2928.POC.001.patch Posting a v.1 POC patch. This implements the first query (the flow activity query). I'll follow it up with another one tomorrow that implements the second one too. This is to get the design choices and correctness reviewed first. It does - include the flow activity query as part of getEntities() - create a data container for the flow activity table called FlowActivityEntity It probably needs a fair amount of refactoring to make the reader code more manageable. Also, I need to add unit tests. They will come later. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)