A group of us in the hadoop community are working on Yarn's next gen timeline service component https://issues.apache.org/jira/browse/YARN-2928
that will be storing for application that runs on a hadoop cluster all of the application stats, workflow metadata and container metrics information in hbase tables (some plain hbase tables and some phoenix based ones). We have been thinking about validating some of the implementation approaches we are taking with HBase. It would be great to get some feedback on the code and design from the HBase dev perspective. Among other things, we are making use of cell tags in coprocessors for summation, min and max operations on different versions of cells in a given column during read as well flush and compaction operations. Some relevant subjiras that deal with hbase coprocessors https://issues.apache.org/jira/browse/YARN-4062 https://issues.apache.org/jira/browse/YARN-3901 We have the schema documented with example records in the code as well as in pdf on the jira. https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/flow/FlowRunTable.java#L34 https://github.com/apache/hadoop/blob/feature-YARN-2928/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/entity/EntityTable.java#L40 Schema jira (pdf attachment that describes the schema) https://issues.apache.org/jira/browse/YARN-3411 Would appreciate any feedback/comments that you have and be glad to answer any questions to clarify in depth further. thanks Vrushali
