[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables

Joep Rottinghuis (JIRA) Sat, 23 May 2015 01:26:36 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joep Rottinghuis updated YARN-3706:
-----------------------------------
    Attachment: YARN-3706-YARN-2928.001.patch

Initial version of patch (YARN-3706-YARN-2928.001.patch)

This patch isn't anywhere in a shape to apply, because I have not yet properly 
setup my environment with the proper HBase 1 or the branch etc.
Still wanted to upload the skeleton of the code to communicate intent.
Also does not yet include the needed changes to TimelineSchemaCreator nor to 
HBaseTimelineWriterImpl.
The structure of HBaseTimelineWriterImpl stays really close to what it is. Init 
will use EntityTable create a type safe BufferedMutator.
Some of the classes that are new in HBase 1 are stubbed out in this code 
(imports from org.apache.hbase.stubbs need to be changed to the real imports).
Apologies for the hackiness.

Ideas in this patch:
- Type parameters will prevent accidental passing of wrong mutator for 
different table to a column,
  or the wrong column family to the wrong column. Compiler won't allow it.

- Tables are fully defined in their own class
- minimize TimelineEntitySchemaConstants
- renamed row key prefix to simply prefix as it is used for more than row keys
- Column and column prefix classes are as short as possible and named after 
table.
- Columns are fully qualified with column name.
- ColumnPrefix is similar to column, except during storage, a column qualifier 
needs to be added. If NONE is chosen, then no prefix is used
(unit test needs to confirm join works properly).
- Keep API simple, just keep as few store methods as needed, no special number, 
String, Long etc. storing. Caller simply converts lists etc to a string.
- Later more behavior can be added to particular columns if needed.
This means that for all those columns where no override is needed for 
timestamp, null is simply passed in.
- Removed usage of Cell as it doesn't seem to be needed when the Put can do the 
same.
- Minimize TimelineWriterUtils to really simply util methods that can be unit 
tested w/o actual HBase (standalone) cluster

- Additional tables should be really easy to add: simply copy EntityTable, 
modify some names and type template. Copy EntityColumn and EntityColumnPrefix, 
modify the column names, string literals etc.

- If needed it should be easy to wrap extra behavior in the buffering to 
collapse together multiple puts with the same rowkey.
- If needed it should be easy to compress column values over certain trigger 
value and add an additional prefix (for example x!) in front of the column.
reader code still needs to be added to ColumnImpl which would then have to 
unwrap these column names and uncompress.

Initially I'm just looking for feedback on structure and approach with 
separation of table, column family, column, and column prefixes from actual 
storage logic.
 

> Generalize native HBase writer for additional tables
> ----------------------------------------------------
>
>                 Key: YARN-3706
>                 URL: https://issues.apache.org/jira/browse/YARN-3706
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Joep Rottinghuis
>            Assignee: Joep Rottinghuis
>            Priority: Minor
>         Attachments: YARN-3706-YARN-2928.001.patch
>
>
> When reviewing YARN-3411 we noticed that we could change the class hierarchy 
> a little in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in 
> place, as performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables

Reply via email to