[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369713#comment-15369713 ] Hudson commented on YARN-4074: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10074/]) YARN-4074. [timeline reader] implement support for querying for flows (sjlee: rev 10fa6da7d8a6013698767c6136ae20f0e04415e9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineEntityReaderFactory.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/GenericEntityReader.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/TestHBaseStorageFlowActivity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/TimelineEntityType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/TestHBaseStorageFlowRun.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/apptoflow/AppToFlowRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/FlowActivityEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/FlowEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/TimelineEntityReader.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowScanner.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestHBaseTimelineStorage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/timelineservice/TestTimelineServiceClientIntegration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/application/ApplicationRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/common/BaseTable.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/TestFlowDataGenerator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowActivityRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineReaderImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timelineservice/TestTimelineServiceRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/FlowRunEntityReader.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/FlowActivityEntityReader.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/ApplicationEntityReader.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityRowKey.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timelineservice/FlowRunEntity.java > [timeline reader] implement support for querying for flows and flow runs >
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903246#comment-14903246 ] Vrushali C commented on YARN-4074: -- Chatted with Li offline and decided to file https://issues.apache.org/jira/browse/YARN-4200 to deal with the refactoring of package names and proceed with this patch. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903284#comment-14903284 ] Li Lu commented on YARN-4074: - Sure, please go ahead with the current patch. Thanks for the work folks! > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903158#comment-14903158 ] Li Lu commented on YARN-4074: - Sorry I missed your message yesterday... I was thinking about putting those hbase reader classes (like ApplicationEntityReader) to a sub dir to indicate they only work with HBase. It's also fine to commit the patch as-is if that's troublesome. I'm OK with both. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903411#comment-14903411 ] Vrushali C commented on YARN-4074: -- Committed patch v8. Thanks [~sjlee0] for the contribution and everyone for the review! > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901019#comment-14901019 ] Vrushali C commented on YARN-4074: -- Thanks everyone for the review, I will commit this patch in today. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901067#comment-14901067 ] Li Lu commented on YARN-4074: - Hi [~sjlee0] [~vrushalic], thanks for the work and sorry I could not get back earlier. Overall the patch LGTM. I like the refactor here and it's almost a must to put it in soon. One nit is, on naming and code organization, we're putting all derived readers in the storage package, but inevitably associating them with our (specific) HBase storage. If it's quick and easy, maybe we can put them in a package inside storage? If I'm missing anything here and it's hard, let proceed with this patch. Your call. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901299#comment-14901299 ] Vrushali C commented on YARN-4074: -- Hi [~gtCarrera9] To confirm my understanding, did you mean putting all reader classes into a package like org.apache.hadoop.yarn.server.timelineservice.storage.reader ? There is a org.apache.hadoop.yarn.server.timelineservice.reader but that is for the web services related code. thanks Vrushali > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876931#comment-14876931 ] Varun Saxena commented on YARN-4074: LGTM. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876185#comment-14876185 ] Vrushali C commented on YARN-4074: -- Patch v8 looks good too me. Thanks for updating the test cases to use the reader objects. +1 > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14875864#comment-14875864 ] Sangjin Lee commented on YARN-4074: --- I would greatly appreciate your review. Thanks! > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803384#comment-14803384 ] Sangjin Lee commented on YARN-4074: --- {quote} In TimelineEntityReader#readMetrics it seems safe to assume that if we have more than one value that this is a TimelineMetric.Type.TIME_SERIES. Conversely it doesn't have to be true though right? I guess we'll just assume that for timelines we'd never have just one value? I can't quite oversee the impact of incorrectly assuming TimelineMetric.Type.SINGLE_VALUE if only one value has been written to HBase yet. {quote} That's right. We discussed this some time ago, and we think it'd be safer if the metric type (single value vs. time series) were stored/persisted. But there are other dimensions of metrics we may need to store (e.g. long vs. float, whether to aggregate, etc.). Also, there is a question of what if users wrote inconsistent data. So, at that time we went with a simple decision that's currently there (the code you see in {{TimelineEntityReader}} is refactored out of {{HBaseTimelineReaderImpl}} so it's not new code). We should come to a conclusion on how to store/encode various dimensions of metrics, but not as part of this JIRA. {quote} Wrt. ApplicationRowKey: at some point (perhaps not this jira) we should consider making the app_id a compound object that is stored with a ? separator. The prefix (in most cases in yarn right now would be "application_") would be separate and the RM start time and the final numeric part would be stored as a numerical value with a separate Bytes.to... conversion. Otherwise we'll end up getting incorrect order for rowkeys when the application id wraps to 10K and each power of ten after that. For example, lexically application_1442351767756_1 < application_1442351767756_ If we just access the application by specific key this doesn't matter, but if we do a row-scan and count on ordering to set an appropriate stop on the scan, we'll break things. This happens on all rowkeys with the app_id in it. {quote} That's a good point. We need to fix this, or we'll have incorrect orders/results happening with queries. This impacts anywhere we rely on the app id order (as string). I'll file a separate JIRA to address this issue. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, > YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, > YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804328#comment-14804328 ] Hadoop QA commented on YARN-4074: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 29m 14s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 11m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 14m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 13s | The applied patch generated 1 new checkstyle issues (total was 31, now 32). | | {color:green}+1{color} | whitespace | 0m 35s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 50s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 49s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 39s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 13s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 2m 8s | Tests passed in hadoop-yarn-server-tests. | | {color:green}+1{color} | yarn tests | 2m 32s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 75m 28s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-api | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12757127/YARN-4074-YARN-2928.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 4b37985 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9194/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9194/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9194/console | This message was automatically generated. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, > YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, > YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804597#comment-14804597 ] Hadoop QA commented on YARN-4074: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 56s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 5s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 32s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 46s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 2m 29s | Tests passed in hadoop-yarn-server-tests. | | {color:green}+1{color} | yarn tests | 2m 46s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 53m 56s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12757157/YARN-4074-YARN-2928.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 4b37985 | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-tests test log | https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9199/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9199/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9199/console | This message was automatically generated. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.008.patch, YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803340#comment-14803340 ] Sangjin Lee commented on YARN-4074: --- That's correct. In other words, those are used to do {{getEntity()}}. That's why I said "a reader for *single-entity reads*" (plural "reads"), as opposed to "a reader for a single entity read". > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.007.patch, > YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch, > YARN-4074-YARN-2928.POC.003.patch, YARN-4074-YARN-2928.POC.004.patch, > YARN-4074-YARN-2928.POC.005.patch, YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790830#comment-14790830 ] Varun Saxena commented on YARN-4074: The patch looks fine to me. I tested some parts related to flow as well. I see that in flow activity table, the row key is cluster id followed by an inverted timestamp. This I guess is to retrieve entities by a certain time range. I havent added this in REST related JIRA and dont see support even here. Will handle it after PoC I guess. Correct ? Also I had added user as an optional query param in REST API code. I think querying by user wont really be a good idea looking at the row key. Will remove it. The major code added in this patch is about the different readers based on table and a factory. It looks fine. However, number of parameters to the methods are quite a lot just like Reader API. As you mentioned elsewhere, maybe I can refactor this later. We should club together some things into logical things like context, filters, etc. That will reduce number of params. In the factory class, we have a sequence of if-else statements. Although its a matter of perspective, sequence of if-else look a little inelegant. But we may not have too many great options here. Thought of enums i.e. having create methods with implementation tied to each enum but entity type enum is not HBase specific. Any other option ? I guess if-else should be fine for now because not too many tables should be added in future, if any. In xxxEntityReader classes, maybe createTable should be renamed to getTable because we are not really creating any table here. We are just getting/creating a table object. Also if I am not wrong, no need to create this object again and again as well. All it really holds is static information such as table name and conf. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790948#comment-14790948 ] Sangjin Lee commented on YARN-4074: --- Thanks for your comments [~varun_saxena]! bq. This I guess is to retrieve entities by a certain time range. I havent added this in REST related JIRA and dont see support even here. Will handle it after PoC I guess. Correct ? The flow activity table is a time-based set of data. The timestamp (day marker really) is there to order the activity in time. It is feasible to query the flow activity table based on time (e.g. "give me all the activity in the past 3 days"). I didn't get around to it, but it should be pretty straightforward to support that after the POC. I'll file a JIRA for adding that support. bq. Also I had added user as an optional query param in REST API code. I think querying by user wont really be a good idea looking at the row key. Will remove it. Yes, the way the data is laid out, cluster + user will not be an efficient query, as time is the component that gets in before the user. {quote} The major code added in this patch is about the different readers based on table and a factory. It looks fine. However, number of parameters to the methods are quite a lot just like Reader API. As you mentioned elsewhere, maybe I can refactor this later. We should club together some things into logical things like context, filters, etc. That will reduce number of params. {quote} That is spot on. I really didn't like having to repeat the long list of arguments. But since you're looking into a better way of capturing the filters and predicates, I'm not really changing things as part of this JIRA. Hope that is consistent with your understanding. {quote} In the factory class, we have a sequence of if-else statements. Although its a matter of perspective, sequence of if-else look a little inelegant. But we may not have too many great options here. Thought of enums i.e. having create methods with implementation tied to each enum but entity type enum is not HBase specific. Any other option ? I guess if-else should be fine for now because not too many tables should be added in future, if any. {quote} I agree. I wanted to use the switch-case statements, but the main issue was that the input is string, not enums. If it were enums, it could have been trivial... {quote} In xxxEntityReader classes, maybe createTable should be renamed to getTable because we are not really creating any table here. We are just getting/creating a table object. Also if I am not wrong, no need to create this object again and again as well. All it really holds is static information such as table name and conf. {quote} Those are good suggestions. Yes, the {{BaseTable}} instances are thread-safe, and I think they can be reused. I'll update the patch to make those changes. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791493#comment-14791493 ] Joep Rottinghuis commented on YARN-4074: Wrt. javadoc comment and method names: "Instantiates a reader for single-entity reads." refers to the # of entities returned in each query, and not the number of times that reader can be used to issue a read right? > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791504#comment-14791504 ] Joep Rottinghuis commented on YARN-4074: In TimelineEntityReader#readMetrics it seems safe to assume that if we have more than one value that this is a TimelineMetric.Type.TIME_SERIES. Conversely it doesn't have to be true though right? I guess we'll just assume that for timelines we'd never have just one value? I can't quite oversee the impact of incorrectly assuming TimelineMetric.Type.SINGLE_VALUE if only one value has been written to HBase yet. Wrt. ApplicationRowKey: at some point (perhaps not this jira) we should consider making the app_id a compound object that is stored with a ? separator. The prefix (in most cases in yarn right now would be "application_") would be separate and the RM start time and the final numeric part would be stored as a numerical value with a separate Bytes.to... conversion. Otherwise we'll end up getting incorrect order for rowkeys when the application id wraps to 10K and each power of ten after that. For example, lexically application_1442351767756_1 < application_1442351767756_ If we just access the application by specific key this doesn't matter, but if we do a row-scan and count on ordering to set an appropriate stop on the scan, we'll break things. This happens on all rowkeys with the app_id in it. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch, > YARN-4074-YARN-2928.POC.006.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745672#comment-14745672 ] Varun Saxena commented on YARN-4074: Thanks [~sjlee0] for updating the patch. Will have a look at it. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch, > YARN-4074-YARN-2928.POC.004.patch, YARN-4074-YARN-2928.POC.005.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735545#comment-14735545 ] Sangjin Lee commented on YARN-4074: --- It'd be great if you could take a look at the latest patch and let me know your feedback. Thanks! > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729556#comment-14729556 ] Sangjin Lee commented on YARN-4074: --- Just to be clear, the current POC patch already handles the null case, and I'm going to update it to check for negative values. Is that reasonable? > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729515#comment-14729515 ] Varun Saxena commented on YARN-4074: The 2nd point I guess even I can handle even in YARN-4075. I can verify limit and if its 0 or negative, forward null to storage layer. If its null, DEFAULT_LIMIT will be applied. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729558#comment-14729558 ] Varun Saxena commented on YARN-4074: [~sjlee0], yeah it handles null case. I meant we can handle negative values even at the REST layer(as part of YARN-4075) i.e. if the limit is negative I can forward null to storage layer which would mean default limit being applied. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729560#comment-14729560 ] Varun Saxena commented on YARN-4074: Its fine though if you are handling negatives as part of YARN-4074. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727402#comment-14727402 ] Varun Saxena commented on YARN-4074: Few more comments. * {{Scan#setMaxResultSize}} only limits the number of rows fetched from server to client in a single call. If more rows are available, they are still fetched when {{ResultScanner#next}} is invoked. This leads to more entities than the limit being returned. setMaxResultSize works similar to JDBCs' ResultSet#setFetchSize So to apply limits in getFlowActivityEntities, we need to have a check for limit in for loop as well in conjunction to using setMaxResultSize. * How do we handle the case of limit being 0 or negative ? In FS based impl, I had changed limit to DEFAULT_LIMIT in both the cases. Do the same here ? > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725914#comment-14725914 ] Varun Saxena commented on YARN-4074: Few comments : # As TreeSet has been used to sort FlowActivityEntity objects in {{HbaseTimelineReaderImpl#getFlowActivityEntities}}, we should either set the fields used in {{TimelineEntity#compareTo}} or provide a compareTo method in FlowActivityEntity class itself. Otherwise I guess entities need to be sorted by date. In which case latter needs to be done. # In parameterized constructor, {{FlowActivityEntity(String cluster, long time, String user, String flowName)}} better to pass the type into superclass constructor ? type can be useful in JSON response ? # Things like configs have no meaning for flows. So things which have no meaning in FlowEntity and FlowActivityEntity should we explicitly set them to null ?No need to send them in JSON(even if empty) I guess. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725918#comment-14725918 ] Varun Saxena commented on YARN-4074: Correction to point 1 : As TreeSet has been used to sort FlowActivityEntity objects in HbaseTimelineReaderImpl#getFlowActivityEntities, we should either set the fields used in TimelineEntity#compareTo or provide a compareTo method in FlowActivityEntity class itself. Otherwise *it leads to NPE*. I guess entities need to be sorted by date. In which case latter needs to be done. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724856#comment-14724856 ] Varun Saxena commented on YARN-4074: Agree...+1 to changing the name. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726295#comment-14726295 ] Sangjin Lee commented on YARN-4074: --- I'm addressing [~varun_saxena]'s latest comments. Thanks for those. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723646#comment-14723646 ] Varun Saxena commented on YARN-4074: Moreover, do we return metrics at all times for a flow run ? Or make returning on metric field conditional ? Should be fine as it is a single flow run. Just confirming because then I would not need fields as a query parameter in REST. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723678#comment-14723678 ] Vrushali C commented on YARN-4074: -- bq. Should we support filtering on the basis of flow start time and probably end time as well ? Going forward, yes, we do need to have a timerange parameter for queries. But for the PoC we work with what is being fetched/returned. bq. Moreover, do we return metrics at all times for a flow run ? Or make returning on metric field conditional ? Just confirming because then I would not need fields as a query parameter in REST. For the PoC we can return everything that is being fetched/ whatever is easier. But as such, we do need filtering out of metrics and returning subsets of metrics etc, so these query parameters would need to be worked out. We have to think about how we can allow for filtering of metrics, but we would anyways need a basic API that returns everything for a flow run, so I think for more enhancements can be added in later after the PoC, what do you think? > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723733#comment-14723733 ] Varun Saxena commented on YARN-4074: Ok...For PoC, this should be fine. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723773#comment-14723773 ] Sangjin Lee commented on YARN-4074: --- Just to clarify on the flow run "end time". Note that there is no formal definition of the "end time" of a flow run, in the absence of a formal flow API in YARN. The "end time" in the flow run is the latest end time of an application that is part of that flow run. Just so that we're clear on that definition. On a related note, there is no formal definition of the state of a flow run (i.e. we cannot say with certainty whether a flow run is ended). The only definitive thing we can say about this is if a flow run is still running, which can be determined by having a running app. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723790#comment-14723790 ] Sangjin Lee commented on YARN-4074: --- Also, while we're at it, I find the name {{FlowEntity}} quite confusing as it really encapsulates a flow run. I find myself having to comment that it is a flow run constantly. Would there be an appetite for renaming this class to {{FlowRunEntity}} as part of this? The impact should be minimal. Let me know your thoughts. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724393#comment-14724393 ] Li Lu commented on YARN-4074: - bq. Would there be an appetite for renaming this class to FlowRunEntity as part of this? The impact should be minimal. Let me know your thoughts. LGTM. IMO we only have flow run objects, but no actual flow objects? Even in flow-based offline aggregation we do not instantiate "flow" objects. This JIRA appears to be the right place to fix it. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724534#comment-14724534 ] Li Lu commented on YARN-4074: - bq. As a general question, since we're returning our timeline entities as jsons in our web service, we need to some sort "rebuild" those entities on the js client side, right? If this is the case, we need to provide some js object model to be consistent with our TimelineEntity object model? I'm not a front-end expert so I'd like to learn the typical practice on this problem. bq. I'm not intimately familiar with that either. I hope someone who's familiar could comment? I was told by the HDFS community that they are using Dust (github.com/linkedin/dustjs) templates to do this. They have code available in our codebase as well. I'm planning to look into this framework in our POC (YARN-4097). > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723307#comment-14723307 ] Varun Saxena commented on YARN-4074: Just had a cursory glance at the patch. A couple of points. # We will be returning a set of {{FlowActivityEntity}} to the user. Should this class be in hadoop-yarn-api instead then ? Because client will have to parse JSON as a set of FlowActivityEntity objects. # Should we support filtering on the basis of flow start time and probably end time as well ? > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-4074-YARN-2928.POC.001.patch, > YARN-4074-YARN-2928.POC.002.patch > > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718120#comment-14718120 ] Varun Saxena commented on YARN-4074: bq. So improving the API would be taken up by Varun. Varun? Yes [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720246#comment-14720246 ] Li Lu commented on YARN-4074: - bq. One thing I forgot to mention is that the current POC patch is a diff against the patch for YARN-3901, to be able to isolate the changes for this JIRA. Thanks for reminding this! I'll take a look at it shortly. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717073#comment-14717073 ] Varun Saxena commented on YARN-4074: Ok..will have a look. We dont need to support a query like list all the flow runs for a flow ? [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717146#comment-14717146 ] Sangjin Lee commented on YARN-4074: --- cc [~gtCarrera9] and [~vrushalic] also for their thoughts. There are some options for this, and there are pros and cons. I'm leaning towards the current proposal ((1) below) for now, but we could enhance this later as the UI jells more. # do a specific entity query for each of the flow runs obtained from the flow activity entity # return all flow runs (possibly with limits and time windows) for the given flow # do a single query for all flow runs specified as a list of flow run id's One interesting thing to note is that a flow activity entity (record) is an activity of that flow *for a given day*. In other words, there can be multiple flow activity entities for the same flow. The flow runs that are returned in the flow activity entity are only for that given day. Then the question is, when I click that flow activity record, what flow runs do I expect to see? It's bit ambiguous, but I think it might make more sense to return only the flow runs that are referenced in that particular day if we're using the flow activity to render the landing page. If we assume that, then (2) is probably not needed for this. Then it leaves us with (1) or (3). The benefit of (1) is that it fits easily into the existing reader API (getEntity). The downside is that you may need to make multiple reader calls to retrieve flow runs But normally the number of flow runs in a day for a given flow should be very small, so it might not be a big deal. One hybrid approach may be that the REST API supports URLs based on the list but the web service code can make multiple reader getEntity() calls. We'd still need to define the form of the URLs to support that type of queries. Thoughts? [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717257#comment-14717257 ] Junping Du commented on YARN-4074: -- Thanks for uploading a patch, [~sjlee0]! Sorry for coming late on this, but have a critical question on TimelineReader interface: bq. Currently I am not planning to add new flow-specific methods to the TimelineReader interface. If so , how to query lastest N records with existing getEntities() API? Actually, I think we should refactor existing getEntities() API before things get worse. It include too many parameters, and most of them are optional. This is very un-handy, easily cause bug and very hard to extend in future. Instead, we should define something like EntityFilter class to include most of these optional fields (include time range, topN, info/config/metric sub-filters, etc.) which also be extended easily for other filters in future. Thoughts? Still in walking through your POC patch, more comments come after. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717758#comment-14717758 ] Li Lu commented on YARN-4074: - Thank [~sjlee0]! I looked at the current POC patch and have some comments: # In general, I'm OK with this approach. I think the current FlowEntity design should provide sufficient information for the web UI POC. # As a general question, since we're returning our timeline entities as jsons in our web service, we need to some sort rebuild those entities on the js client side, right? If this is the case, we need to provide some js object model to be consistent with our TimelineEntity object model? I'm not a front-end expert so I'd like to learn the typical practice on this problem. # Please make sure, in the final patch, to change timeline schema creator so that we're consistent with the list of tables. Maybe we'd like to find some better ways to keep all these tables consistent within writer, reader and schema creator in future. # I agree with all of you guys that we may want to refactor the current implementation. For example, we may not want to dispatch incoming timeline entity to different tables by a list of if-statements (deciding which table to go has already caused me some confusion when working on the offline aggregator patch rebase)? Also, the parsing logic can also be easily isolated I believe? # Some changes in files like FlowActivityRowKey.java are not included in this patch? [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717681#comment-14717681 ] Li Lu commented on YARN-4074: - Hi [~sjlee0], so far the first option looks good to me. The upside of this is that it fits our web UI POC requirements fine, and it's relatively clean to maintain. The downside is that in order to support some complex use cases, we need to make some compositions. For the current stage I think it's fine and we can use it to bootstrap our web UI renderers. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717721#comment-14717721 ] Junping Du commented on YARN-4074: -- Ok. Have a separated JIRA to track this refactor work should be fine. Thanks for pointing to that JIRA. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717855#comment-14717855 ] Sangjin Lee commented on YARN-4074: --- Thanks [~gtCarrera9] for your comments. {quote} As a general question, since we're returning our timeline entities as jsons in our web service, we need to some sort rebuild those entities on the js client side, right? If this is the case, we need to provide some js object model to be consistent with our TimelineEntity object model? I'm not a front-end expert so I'd like to learn the typical practice on this problem. {quote} I'm not intimately familiar with that either. I hope someone who's familiar could comment? I'm going to do some refactoring to move away from the if-else branch (yuck). There are aspects such as input validation, getting results from HBase, and creating the entity objects that can be isolated more clearly. I need to give some more thoughts on how to encapsulate that more clearly. This has some bearing on the filter-related work that Varun is doing, so I'll try not to touch that area in this JIRA. One thing I forgot to mention is that the current POC patch is a diff against the patch for YARN-3901, to be able to isolate the changes for this JIRA. The patch for YARN-3901 needs to be reviewed and committed before this can be. That's why this patch is missing what's included in the YARN-3901 patch. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: YARN-4074-YARN-2928.POC.001.patch, YARN-4074-YARN-2928.POC.002.patch Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715208#comment-14715208 ] Vrushali C commented on YARN-4074: -- My take is that we can make things as generic as possible but we should have separate apis for flows and flow runs. I had put up an initial proposal for flow based queries in ATS when we started off on this at https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx I believe for the two queries you have listed above [~sjlee0], there would be two rest apis as: 1) Get All Flows Path: /listFlows/cluster/ Returns: paginated list of apps with aggregated stats (to populate the flows list tab on the UI) Sample URL: http://timelineservice.example.com/ws/v2/listFlows/clusterid?limit=2startTime=20140510endTime=20140601 This would be an UI related aggregation query 2) Get specific Flow's runs Path: /flow/cluster/user/flowName/[version] Returns: list of flows Sample URL: http://timelineservice.example.com/ws/v2/flow/clusterid/userName/someFlowName_idenitying_a_flow?limit=2startTime=1390939248000endTime=139361764800 [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715765#comment-14715765 ] Sangjin Lee commented on YARN-4074: --- I am about 90% done with the POC patch for this. I'm shooting for some time tomorrow to be able to post the patch. In the meantime, in order to enable [~varun_saxena] and others to make progress, the following is the proposal that I'm implementing. Please *do* let me know if you have any questions or issues with the proposal so we can adjust accordingly. (REST API) In order to support the POC UI, we will implement 2 new queries: # given the cluster, return the N most recent flows from the flow activity table # given the cluster, user, flow id, and flow run id, return the flow run (with metrics) from the flow run table At the REST level, they can be represented as follows for example: # /listFlows/clusterId?limit=100 # /flow/clusterId/userId/flowName/flowRun (UI) With these URLs, the UI can invoke the first URL to render the landing page with the table. The REST output contains the flow activity records along with all the flow runs that were active during the day. If the user drills down on a single flow, then the client side can generate the second queries against all the flow runs for that flow to fetch the metrics at the flow run level. If the user further drills down into a single flow run, then it can do a (existing) query to retrieve all applications for a given flow run to get the application entities. (reader interface) Currently I am *not* planning to add new flow-specific methods to the {{TimelineReader}} interface. Instead, you can use the existing {{getEntities()}} and {{getEntity()}} methods to perform the above new queries: # {{getEntities()}} with cluster specified and entity type = YARN_FLOW_ACTIVITY (a new timeline entity type) # {{getEntity()}} with cluster, user, flow id, flow run id specified and entity type = YARN_FLOW [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709839#comment-14709839 ] Sangjin Lee commented on YARN-4074: --- The queries we will need to support are as follows (let me know if you believe it's not accurate): - given cluster, query the most recent N flows (from the flow activity table) - (optionally) given cluster, user, flow id, query all flow runs In terms of the implementation, there are two approaches. We can either define specific methods for querying for flow and flow runs, and implement them, or reuse the {{getEntities()}} method to implement them. With the former approach, we might be having a proliferation of methods that are specific to types. On the other hand with the latter, the API may remain clean but the implementation would become messier with more if-else type of code. Personally I'm slightly leaning towards the latter, but I'd love others' opinion. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710102#comment-14710102 ] Li Lu commented on YARN-4074: - I'd incline to use the latter approach to retrieve flows and flow runs, since we don't actually differentiate them on the backend. I also incline to keep the RESTful API layer simple, and to wrap it with a js native library for web UIs. In this way we can separate the process of TimelineEntity retrieval and the context of the timeline entities (e.g. is it a flow, or a application, or a DAG of applications?). It's also much easier to maintain this interface IMO. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710142#comment-14710142 ] Li Lu commented on YARN-4074: - bq. This also implies that the canonical stores for the flows and the flow runs are the flow activity table and the flow run table respectively... Ah right... This makes the unified interface less appealing since we may need to branch a lot with the getEntities method. However, if we proceed in this direction, maybe we'd like to do a deeper refactor of that code. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710133#comment-14710133 ] Sangjin Lee commented on YARN-4074: --- Actually the backend will need to differentiate the queries for flows and flow runs from those for other entities, right? For the HBase backend, queries for the flows will need to be sent to the flow activity table, those for the flow runs will be sent to the flow run table. This also implies that the canonical stores for the flows and the flow runs are the flow activity table and the flow run table respectively... We've already gone that way a little bit with the application table, and we need to be comfortable with that for us to implement the latter approach. [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)