[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-3411:
-----------------------------
    Attachment: YARN-3411.poc.2.txt

Attaching a patch that includes:
-  a HBaseTimelineWriterImpl class
- a test class for the same
- an EntityTableDetails class for storing some entity table specific constants 
and other functions
- a TimelineWriterUtils class which has utility functions that are useful while 
reading from and writing to hbase tables

The write function in HBaseTimelineWriterImpl class writes out the entire 
contents of a TimelineEntity object including it's info, config, metrics 
(timeseries), isRelatedTo and relatesTo fields. 

The metrics timeseries is written such that the hbase cell timestamp is set to 
the metric timestamp, the hbase cell column qualifier is the metric name and 
the value is the metric value. I also propose changing the TimelineMetric 
values to be "long" instead of "Object" (although this patch does not make that 
change). 

For the metrics column family, we should set a TTL of X days and MIN_VERSIONS = 
1. That way, the timeseries info will be retained for X days by hbase and the 
latest value will always be retained. 

The test class spins up a MiniCluster via HBaseTestingUtility's 
startMiniCluster.  It creates one entity object with info, config, metrics 
(timeseries), isRelatedTo and relatesTo entities and writes it to the backend 
by invoking the write api in HBaseTimelineWriterImpl class. The test scans the 
entity table and reads back the entity details and verifies the values of each 
field, including the timeseries. 

Also attaching an eclipse console log that ran the unit test. 

The schema creation would be along the lines of this:
{code}
create 'ats.entity',
  {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'},
  {NAME => 'm', VERSIONS => 2147483647, MIN_VERSIONS => 1, COMPRESSION => 
'LZO', BLOCKCACHE => false, TTL => '2592000'},
  {NAME => 'c', COMPRESSION => 'LZO', BLOCKCACHE => false, BLOOMFILTER => 
'ROWCOL' }

{code}

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to