[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349365#comment-14349365
 ] 

Vrushali C commented on YARN-3134:
----------------------------------

Hi [~swagle]
bq. Do the responses to these API calls return any timeseries data?: 
_GetFlowByAppId_ and _GetAppDetails_
No, not for these two. These are specific to that hadoop application id which 
is unique. 

bq.  The set of access patterns do not cover query directly by a metricName. Is 
there a use case for this? (Note: General use case for driving graphs)
In hRaven, we usually fetch everything for a given flow and time range and 
allow filtering/searching in the UI for querying for a particular metric.  


bq. Do you use the hbase native timestamp for querying? This is an obvious 
optimization for timeseries data.
No, we don’t use that one at all. We have the submit time of a flow stored as 
run id in  row key (as well as in columns). 
The row key for job history is
{code} 
cluster ! user ! flow name ! run id ! app id
{code}
where run id is the submit time of the flow. It is stored as an inverted long, 
which helps maintain the sorting such that most recent flow runs are stored 
first for that flow.  When querying for time series or time range, having this 
inverted long in row key helps to set the start and stop rows for scan so that 
it's time bound.  

Eg: 
https://github.com/twitter/hraven/blob/master/hraven-core/src/main/java/com/twitter/hraven/datasource/JobHistoryService.java#L277

bq. however how do you handle out of band data
I am sorry, I didn’t get what is out of band data?


> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to