[
https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648472#comment-14648472
]
Sangjin Lee commented on YARN-3906:
-----------------------------------
bq. I've also noticed that the newly added Application*.java files overlap
significantly with Entity*.java.
Thanks for bringing up that point [~gtCarrera9]. I should have added some
explanations on why I wrote it this way. That is the first thing I noticed as I
looked into adding the new table.
\*Table and \*RowKey are not so bad, but \*ColumnFamily, \*Column, and
\*ColumnPrefix have definitely a lot of overlapping code. That is largely an
artifact of the design decision to use enums to implement these classes. Enums
are nice because it lets us seal the list of members cleanly, and the code that
uses the API becomes very strongly typed. On the other hand, the downside is
that enums cannot be extended.
If enums could be extended, we could have created a base class that's common
both for the entity table and the application table, and have the entity table
and the application table extend it pretty trivially. But unfortunately it
doesn't work with enums. Nor does Java have an option of mix-ins like scala.
As a way to minimize the duplication, we introduced {{ColumnHelper}} to provide
many of the common operations into that helper class. You'll notice that most
of the implementations in the \*Column\* classes are simple pass-through to
{{ColumnHelper}}.
This issue is more pronounced because the entity table and the application
table are so similar. For example, for the app-to-flow table (which Zhijie is
working on), this might not be as big an issue.
We could think of some alternatives, but I think they also have their own
challenges. First, we could think of having only one set of classes both for
the entity table and the application table, and controlling which one to use
via some sort of an argument/flag. But then the problem is that we would have
lots of {{if application ... else ...}} code scattered around in that single
implementation. I'm not sure if it is an improvement.
Eventually, if this becomes more of a need, we could envision writing some sort
of code generation and the table/schema description instruction so that given
the schema description these classes can be simply code-generated. However, as
you may know, code generation is not without problems...
I hope this clarifies some of the thinking that went into this.
> split the application table from the entity table
> -------------------------------------------------
>
> Key: YARN-3906
> URL: https://issues.apache.org/jira/browse/YARN-3906
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Affects Versions: YARN-2928
> Reporter: Sangjin Lee
> Assignee: Sangjin Lee
> Attachments: YARN-3906-YARN-2928.001.patch,
> YARN-3906-YARN-2928.002.patch
>
>
> Per discussions on YARN-3815, we need to split the application entities from
> the main entity table into its own table (application).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)