[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2017-10-21 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3904:
---
Fix Version/s: 2.9.0

> Refactor timelineservice.storage to add support to online and offline 
> aggregation writers
> -
>
> Key: YARN-3904
> URL: https://issues.apache.org/jira/browse/YARN-3904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-3904-YARN-2928.001.patch, 
> YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
> YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
> YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
> YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our 
> existing Phoenix storage into the storage of the aggregated data. In this 
> JIRA, I'm proposing to refactor writers to add support to aggregation 
> writers. Offline aggregation writers typically has less contextual 
> information. We can distinguish these writers by special naming. We can also 
> use CollectorContexts to model all contextual information and use it in our 
> writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-14 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.009.patch

Fixed the typo raised by [~vrushalic]. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-06 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.008.patch

v008 patch, rebase to latest YARN-2928 branch. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-03 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.007.patch

Thanks [~zjshen] and [~sjlee0]! I've uploaded v007 patch to address your 
comments. Specifically:

bq. However, it's better to have config such as blah.blah.backend.type. When 
backend.type = hbase, we user can access HBase both directly and via Phoenix, 
and we allow aggregation. This may not need to part of this jira, but just 
think it out loudly.
Yes, this proposal makes sense. Actually it significantly simplifies 
deployment, since users no longer needs to know the exact class name of the 
backend. However, if we decide to move along this direction, there are some 
foreseeable nontrivial work such as changing class loading strategies. To 
better separate the current workload I propose to address this issue in a 
separate JIRA (we may not address it immediately).

bq. Make sense, but can we still make table creation centralized? I think we 
can make some option to create raw entity tables and aggregation tables 
separately. 
bq. As for createTables(), I'm also of the opinion that it might be better if 
we moved it to a dedicated creator class.
I agree it is appealing to centralize table creations. After putting some 
thoughts here I think what we really want is a centralized _workflow_ for 
storage schema creations. That is to say, when setting up a v2 timeline server, 
users can simply run data schema creator for once to create necessary data 
storage schemas. With this in mind, I added Phoenix schema creation into the 
existing data schema creator, with a separate option {{-p}}. However, I'm 
keeping the SQL statements for table creation inside the writer file so that we 
also have a centralized place for the Phoenix storage schema. 

bq. Actually I'd like to ask whether this needs to be a service. Note that it 
is possible (or likely) that the writer will be executed in a mapreduce task.
We implement offline writers as {{AbstractServices}} to reuse the existing 
logic for service initialization, start, and finish. This pattern matches 
nicely with our use cases of our offline writers. I admit it sounds a little 
bit awkward if we call something inside a mapreduce job as a service. 
However, the hadoop {{Service}} is just a light weight package for service 
lifecycle management. It does not strongly tight to server side or 
non-application use cases. Therefore I modified the writer to an 
{AbstractService} per [~zjshen]'s suggestion. 

bq. For the user aggregation tables, I believe the cluster needs to be included 
in the row key.
Yes. Fixed. 

bq. l.156: My JDBC knowledge is bit outdated, but do you want to prepare the 
statement every time write is done? Don't you want to prepare it once and reuse 
it? That optimization will follow later?
Nice catch. We can definitely reuse this PreparedStatement (as well as the 
connections) after we integrated the aggregation writer with the aggregator. My 
plan is to use this (relatively) stable writer to unblock the future patch on 
flow and user level offline aggregation. After we have the whole workflow, we 
can gradually add optimizations. Thoughts? 

bq. I would enforce the notion that this is a read-only object by making the 
members final
Yes. Fixed. 

bq. Should the primary key user and then cluster, or cluster and user? I think 
it might be better if it is cluster and user although it is different than the 
entity table. Vrushali C?
I'm OK with either. Any suggestions [~vrushalic]? Anyways we can decide it 
similar to TimelineEvents in HBase storage so I don't think this is blocking 
the JIRA? 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-31 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.006.patch

Upload the 006 version of the patch. In this patch I addressed most of 
[~zjshen]'s review comments. I think I need some discussions on the following 
two points:
bq. moving the table creation stuff into TimelineSchemaCreator.
I'm not 100% sure if that's what we would like to do. Maybe we would like to 
decouple the offline aggregation module from our normal entity storage. 
Therefore, maybe it's also appealing to allow users specify if they need to 
create data schema in the offline aggregation process? Such as, setting one 
flag in the offline aggregator to create data schema?

bq. As HBase backend is accessed both directly and via Phoenix, it's good for 
us to cleanup the configuration to say we're using the HBase backend (comparing 
to FS backend) instead of specifically HBase or Phoenix writer/reader.
After the changes in this JIRA, we will only have two types of TimelineWriters, 
one for FS (test only) and one for HBase. The setting on the offline storage 
should be independent from this setting, I assume? 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-29 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.005.patch

Refreshed my patch according to [~sjlee0]'s comments. Specifically, I set up a 
new interface (OfflineAggregationWriter) for aggregation writers. With this new 
interface I decoupled PhoenixOfflineAggregationWriter from TimelineWriter. 
Having a separate offline writer interface also gives us more freedom to design 
the aggregation storage interface. Now in the new writer API the type of the 
offline aggregation is specified by the incoming {{OfflineAggregationInfo}}. 

I also considered to combine reader and writer interfaces into a 
OfflineAggregationStorage interface, but it turned out that we may have some 
reader-only implementations (such as reading app level aggregations from 
HBase). Separating offline readers and writers will give us more freedom in 
this case. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-27 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.004.patch

Update the 004 version of the patch. This patch addresses the following two 
major issues:
# Rebuild the current Phoenix writer into an offline aggregation writer. 
Specifically, the writer writes info and metric data into the newly created 
Phoenix offline aggregation table. 
# Simplify writer interface by using TimelineCollectorContext. In this way both 
normal writers and offline aggregation writers can use the same interface to 
write data. 

One thing pending discussion is about the {{aggregation}} method. I feel this 
method is a little bit outdated. Could anyone remind me the assumed use case 
for it? Will it fit for real-time aggregations only? 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-24 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Description: After we finished the design for time-based aggregation, we 
can adopt our existing Phoenix storage into the storage of the aggregated data. 
In this JIRA, I'm proposing to refactor writers to add support to aggregation 
writers. Offline aggregation writers typically has less contextual information. 
We can distinguish these writers by special naming. We can also use 
CollectorContexts to model all contextual information and use it in our writer 
interfaces.   (was: After we finished the design for time-based aggregation, we 
can adopt our existing Phoenix storage into the storage of the aggregated data. 
This JIRA proposes to move the Phoenix storage implementation from 
o.a.h.yarn.server.timelineservice.storage to 
o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully 
devoted writer for time-based aggregation. )

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-24 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Summary: Refactor timelineservice.storage to add support to online and 
offline aggregation writers  (was: Adopt PhoenixTimelineWriter into time-based 
aggregation storage)

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. This JIRA 
 proposes to move the Phoenix storage implementation from 
 o.a.h.yarn.server.timelineservice.storage to 
 o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully 
 devoted writer for time-based aggregation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)