[jira] [Commented] (YARN-3906) split the application table from the entity table

Sangjin Lee (JIRA) Thu, 30 Jul 2015 16:24:25 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648472#comment-14648472
 ]


Sangjin Lee commented on YARN-3906:
-----------------------------------

bq. I've also noticed that the newly added Application*.java files overlap 
significantly with Entity*.java.

Thanks for bringing up that point [~gtCarrera9]. I should have added some 
explanations on why I wrote it this way. That is the first thing I noticed as I 
looked into adding the new table.

\*Table and \*RowKey are not so bad, but \*ColumnFamily, \*Column, and 
\*ColumnPrefix have definitely a lot of overlapping code. That is largely an 
artifact of the design decision to use enums to implement these classes. Enums 
are nice because it lets us seal the list of members cleanly, and the code that 
uses the API becomes very strongly typed. On the other hand, the downside is 
that enums cannot be extended.

If enums could be extended, we could have created a base class that's common 
both for the entity table and the application table, and have the entity table 
and the application table extend it pretty trivially. But unfortunately it 
doesn't work with enums. Nor does Java have an option of mix-ins like scala.

As a way to minimize the duplication, we introduced {{ColumnHelper}} to provide 
many of the common operations into that helper class. You'll notice that most 
of the implementations in the \*Column\* classes are simple pass-through to 
{{ColumnHelper}}.

This issue is more pronounced because the entity table and the application 
table are so similar. For example, for the app-to-flow table (which Zhijie is 
working on), this might not be as big an issue.

We could think of some alternatives, but I think they also have their own 
challenges. First, we could think of having only one set of classes both for 
the entity table and the application table, and controlling which one to use 
via some sort of an argument/flag. But then the problem is that we would have 
lots of {{if application ... else ...}} code scattered around in that single 
implementation. I'm not sure if it is an improvement.

Eventually, if this becomes more of a need, we could envision writing some sort 
of code generation and the table/schema description instruction so that given 
the schema description these classes can be simply code-generated. However, as 
you may know, code generation is not without problems...

I hope this clarifies some of the thinking that went into this.

> split the application table from the entity table
> -------------------------------------------------
>
>                 Key: YARN-3906
>                 URL: https://issues.apache.org/jira/browse/YARN-3906
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>         Attachments: YARN-3906-YARN-2928.001.patch, 
> YARN-3906-YARN-2928.002.patch
>
>
> Per discussions on YARN-3815, we need to split the application entities from 
> the main entity table into its own table (application).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3906) split the application table from the entity table

Reply via email to