subject:"\[jira\] \[Updated\] \(YARN\-3904\) Refactor timelineservice.storage to add support to online and offline aggregation writers"

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2017-10-21 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3904:
---
Fix Version/s: 2.9.0

> Refactor timelineservice.storage to add support to online and offline 
> aggregation writers
> -
>
> Key: YARN-3904
> URL: https://issues.apache.org/jira/browse/YARN-3904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-3904-YARN-2928.001.patch, 
> YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
> YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
> YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
> YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch
>
>
> After we finished the design for time-based aggregation, we can adopt our 
> existing Phoenix storage into the storage of the aggregated data. In this 
> JIRA, I'm proposing to refactor writers to add support to aggregation 
> writers. Offline aggregation writers typically has less contextual 
> information. We can distinguish these writers by special naming. We can also 
> use CollectorContexts to model all contextual information and use it in our 
> writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-14 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.009.patch

Fixed the typo raised by [~vrushalic]. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-06 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.008.patch

v008 patch, rebase to latest YARN-2928 branch. 

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, 
 YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, 
 YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, 
 YARN-3904-YARN-2928.008.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. In this 
 JIRA, I'm proposing to refactor writers to add support to aggregation 
 writers. Offline aggregation writers typically has less contextual 
 information. We can distinguish these writers by special naming. We can also 
 use CollectorContexts to model all contextual information and use it in our 
 writer interfaces. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-08-03 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.007.patch

Thanks [~zjshen] and [~sjlee0]! I've uploaded v007 patch to address your
comments. Specifically:

bq. However, it's better to have config such as blah.blah.backend.type. When
backend.type = hbase, we user can access HBase both directly and via Phoenix,
and we allow aggregation. This may not need to part of this jira, but just
think it out loudly.
Yes, this proposal makes sense. Actually it significantly simplifies
deployment, since users no longer needs to know the exact class name of the
backend. However, if we decide to move along this direction, there are some
foreseeable nontrivial work such as changing class loading strategies. To
better separate the current workload I propose to address this issue in a
separate JIRA (we may not address it immediately).

bq. Make sense, but can we still make table creation centralized? I think we
can make some option to create raw entity tables and aggregation tables
separately.
bq. As for createTables(), I'm also of the opinion that it might be better if
we moved it to a dedicated creator class.
I agree it is appealing to centralize table creations. After putting some
thoughts here I think what we really want is a centralized _workflow_ for
storage schema creations. That is to say, when setting up a v2 timeline server,
users can simply run data schema creator for once to create necessary data
storage schemas. With this in mind, I added Phoenix schema creation into the
existing data schema creator, with a separate option {{-p}}. However, I'm
keeping the SQL statements for table creation inside the writer file so that we
also have a centralized place for the Phoenix storage schema.

bq. Actually I'd like to ask whether this needs to be a service. Note that it
is possible (or likely) that the writer will be executed in a mapreduce task.
We implement offline writers as {{AbstractServices}} to reuse the existing
logic for service initialization, start, and finish. This pattern matches
nicely with our use cases of our offline writers. I admit it sounds a little
bit awkward if we call something inside a mapreduce job as a service.
However, the hadoop {{Service}} is just a light weight package for service
lifecycle management. It does not strongly tight to server side or
non-application use cases. Therefore I modified the writer to an
{AbstractService} per [~zjshen]'s suggestion.

bq. For the user aggregation tables, I believe the cluster needs to be included
in the row key.
Yes. Fixed.

bq. l.156: My JDBC knowledge is bit outdated, but do you want to prepare the
statement every time write is done? Don't you want to prepare it once and reuse
it? That optimization will follow later?
Nice catch. We can definitely reuse this PreparedStatement (as well as the
connections) after we integrated the aggregation writer with the aggregator. My
plan is to use this (relatively) stable writer to unblock the future patch on
flow and user level offline aggregation. After we have the whole workflow, we
can gradually add optimizations. Thoughts?

bq. I would enforce the notion that this is a read-only object by making the
members final
Yes. Fixed.

bq. Should the primary key user and then cluster, or cluster and user? I think
it might be better if it is cluster and user although it is different than the
entity table. Vrushali C?
I'm OK with either. Any suggestions [~vrushalic]? Anyways we can decide it
similar to TimelineEvents in HBase storage so I don't think this is blocking
the JIRA?

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

Key: YARN-3904
URL: https://issues.apache.org/jira/browse/YARN-3904
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
Attachments: YARN-3904-YARN-2928.001.patch,
YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch,
YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch,
YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch

After we finished the design for time-based aggregation, we can adopt our
existing Phoenix storage into the storage of the aggregated data. In this
JIRA, I'm proposing to refactor writers to add support to aggregation
writers. Offline aggregation writers typically has less contextual
information. We can distinguish these writers by special naming. We can also
use CollectorContexts to model all contextual information and use it in our
writer interfaces.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-31 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.006.patch

Upload the 006 version of the patch. In this patch I addressed most of
[~zjshen]'s review comments. I think I need some discussions on the following
two points:
bq. moving the table creation stuff into TimelineSchemaCreator.
I'm not 100% sure if that's what we would like to do. Maybe we would like to
decouple the offline aggregation module from our normal entity storage.
Therefore, maybe it's also appealing to allow users specify if they need to
create data schema in the offline aggregation process? Such as, setting one
flag in the offline aggregator to create data schema?

bq. As HBase backend is accessed both directly and via Phoenix, it's good for
us to cleanup the configuration to say we're using the HBase backend (comparing
to FS backend) instead of specifically HBase or Phoenix writer/reader.
After the changes in this JIRA, we will only have two types of TimelineWriters,
one for FS (test only) and one for HBase. The setting on the offline storage
should be independent from this setting, I assume?

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-29 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.005.patch

Refreshed my patch according to [~sjlee0]'s comments. Specifically, I set up a
new interface (OfflineAggregationWriter) for aggregation writers. With this new
interface I decoupled PhoenixOfflineAggregationWriter from TimelineWriter.
Having a separate offline writer interface also gives us more freedom to design
the aggregation storage interface. Now in the new writer API the type of the
offline aggregation is specified by the incoming {{OfflineAggregationInfo}}.

I also considered to combine reader and writer interfaces into a
OfflineAggregationStorage interface, but it turned out that we may have some
reader-only implementations (such as reading app level aggregations from
HBase). Separating offline readers and writers will give us more freedom in
this case.

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-27 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Attachment: YARN-3904-YARN-2928.004.patch

Update the 004 version of the patch. This patch addresses the following two
major issues:
# Rebuild the current Phoenix writer into an offline aggregation writer.
Specifically, the writer writes info and metric data into the newly created
Phoenix offline aggregation table.
# Simplify writer interface by using TimelineCollectorContext. In this way both
normal writers and offline aggregation writers can use the same interface to
write data.

One thing pending discussion is about the {{aggregation}} method. I feel this
method is a little bit outdated. Could anyone remind me the assumed use case
for it? Will it fit for real-time aggregations only?

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-24 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Li Lu updated YARN-3904:

Description: After we finished the design for time-based aggregation, we
can adopt our existing Phoenix storage into the storage of the aggregated data.
In this JIRA, I'm proposing to refactor writers to add support to aggregation
writers. Offline aggregation writers typically has less contextual information.
We can distinguish these writers by special naming. We can also use
CollectorContexts to model all contextual information and use it in our writer
interfaces. (was: After we finished the design for time-based aggregation, we
can adopt our existing Phoenix storage into the storage of the aggregated data.
This JIRA proposes to move the Phoenix storage implementation from
o.a.h.yarn.server.timelineservice.storage to
o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully
devoted writer for time-based aggregation. )

Refactor timelineservice.storage to add support to online and offline
aggregation writers
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

2015-07-24 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3904:

Summary: Refactor timelineservice.storage to add support to online and 
offline aggregation writers  (was: Adopt PhoenixTimelineWriter into time-based 
aggregation storage)

 Refactor timelineservice.storage to add support to online and offline 
 aggregation writers
 -

 Key: YARN-3904
 URL: https://issues.apache.org/jira/browse/YARN-3904
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu
 Attachments: YARN-3904-YARN-2928.001.patch, 
 YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch


 After we finished the design for time-based aggregation, we can adopt our 
 existing Phoenix storage into the storage of the aggregated data. This JIRA 
 proposes to move the Phoenix storage implementation from 
 o.a.h.yarn.server.timelineservice.storage to 
 o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully 
 devoted writer for time-based aggregation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers

9 matches

Site Navigation

Mail list logo

Footer information