[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172304#comment-15172304
 ] 

Sangjin Lee commented on YARN-4700:
-----------------------------------

I may have misread the comments in haste last Friday. If the comments meant 
that we would use the event timestamps instead of the current time and 
calculate the top-of-the-day timestamps from them, then I concur. If they meant 
that we would use the actual event timestamps *as is* for the row key, I'm not 
as sure.

My main concern there is it might make some of the queries we want to do 
against this table in the future harder or make them perform more poorly. For 
example, we could do a query like "return all flow activities in the last 7 
days". With a top-of-the-day timestamps, it would be a simple partial row key 
matching. With variable timestamps, it would become more of a range query. Are 
my concerns overblown?

If the solution we're discussing is the former, then I think it's quite 
straightforward. We need a little bit of change in 
{{FlowActivityRowKey.getRowKey()}} where we should apply 
{{TimelineStorageUtils.getTopOfTheDayTimestamp()}} on the provided timestamp.

> ATS storage has one extra record each time the RM got restarted
> ---------------------------------------------------------------
>
>                 Key: YARN-4700
>                 URL: https://issues.apache.org/jira/browse/YARN-4700
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Li Lu
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to