[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
------------------------------
    Attachment: YARN-4074-YARN-2928.POC.003.patch

POC v.3 patch posted.

Key changes include
- switched from Get.setMaxResultSize() to PageFilter (more on that below)
- major refactoring of HBaseTimelineReaderImpl
-- introduced TimelineEntityReader and the hierarchy of classes to isolate 
proper reading per type
- added unit tests to test HBaseTimelineReaderImpl for flow activity and flow 
runs
- fixed an issue with FlowScanner where the cells were returned in the wrong 
order so it was breaking Column.readResult()
- made *RowKey classes real object classes, and added the parseRowKey method 
that returns an instance of the RowKey
- fixed the order of the add and pollLast
- renamed FlowEntity to FlowRunEntity
- added the compareTo() method for FlowActivityEntity
- passed the type into the FlowActivityEntity constructor
- set configs for FlowActivityEntity and FlowRunEntity to null
- improved the way we get string values from info for FlowActivityEntity and 
FlowRunEntity
- added getNumberOfRuns() to FlowActivityEntity

It is actually pretty close to being ready, but since YARN-3901 is still 
outstanding, I'm not making it an official patch yet.

As for the PageFilter issue, I concluded setMaxResultSize() is not the right 
API to use to limit the number of rows. I believe the PageFilter is the right 
thing to use. I also added the counting logic to get the right number of 
records even if the result iterator advances.

As for the FlowScanner issue mentioned above, [~vrushalic] and [~jrottinghuis] 
debugged this to track down a bug in YARN-3901. As such, this change will 
likely be made in the final YARN-3901 patch. I just included it here for 
completeness and to make the unit code pass.

You should be able to apply the YARN-3901 v.3 patch and then this patch 
cleanly. Let me know if you have any questions.

I'd greatly appreciate review feedback. I understand it's a lot of code...

> [timeline reader] implement support for querying for flows and flow runs
> ------------------------------------------------------------------------
>
>                 Key: YARN-4074
>                 URL: https://issues.apache.org/jira/browse/YARN-4074
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch, YARN-4074-YARN-2928.POC.003.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to