[ https://issues.apache.org/jira/browse/YARN-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648472#comment-14648472 ]
Sangjin Lee commented on YARN-3906: ----------------------------------- bq. I've also noticed that the newly added Application*.java files overlap significantly with Entity*.java. Thanks for bringing up that point [~gtCarrera9]. I should have added some explanations on why I wrote it this way. That is the first thing I noticed as I looked into adding the new table. \*Table and \*RowKey are not so bad, but \*ColumnFamily, \*Column, and \*ColumnPrefix have definitely a lot of overlapping code. That is largely an artifact of the design decision to use enums to implement these classes. Enums are nice because it lets us seal the list of members cleanly, and the code that uses the API becomes very strongly typed. On the other hand, the downside is that enums cannot be extended. If enums could be extended, we could have created a base class that's common both for the entity table and the application table, and have the entity table and the application table extend it pretty trivially. But unfortunately it doesn't work with enums. Nor does Java have an option of mix-ins like scala. As a way to minimize the duplication, we introduced {{ColumnHelper}} to provide many of the common operations into that helper class. You'll notice that most of the implementations in the \*Column\* classes are simple pass-through to {{ColumnHelper}}. This issue is more pronounced because the entity table and the application table are so similar. For example, for the app-to-flow table (which Zhijie is working on), this might not be as big an issue. We could think of some alternatives, but I think they also have their own challenges. First, we could think of having only one set of classes both for the entity table and the application table, and controlling which one to use via some sort of an argument/flag. But then the problem is that we would have lots of {{if application ... else ...}} code scattered around in that single implementation. I'm not sure if it is an improvement. Eventually, if this becomes more of a need, we could envision writing some sort of code generation and the table/schema description instruction so that given the schema description these classes can be simply code-generated. However, as you may know, code generation is not without problems... I hope this clarifies some of the thinking that went into this. > split the application table from the entity table > ------------------------------------------------- > > Key: YARN-3906 > URL: https://issues.apache.org/jira/browse/YARN-3906 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: YARN-3906-YARN-2928.001.patch, > YARN-3906-YARN-2928.002.patch > > > Per discussions on YARN-3815, we need to split the application entities from > the main entity table into its own table (application). -- This message was sent by Atlassian JIRA (v6.3.4#6332)