[jira] [Updated] (KYLIN-3221) Allow externalizing lookup table snapshot

2018-06-20 Thread Shaofeng SHI (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3221:

Issue Type: New Feature  (was: Improvement)

> Allow externalizing lookup table snapshot
> -
>
> Key: KYLIN-3221
> URL: https://issues.apache.org/jira/browse/KYLIN-3221
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine, Metadata, Query Engine
>Reporter: Ma Gang
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.4.0
>
> Attachments: KYLIN-3221-web-error.png
>
>
> There are two limitations for current look table design:
>  # lookup table size is limited, because table snapshot need to be cached in 
> Kylin server, too large snapshot table will break the server.
>  # lookup table snapshot references are stored in all segments of the cube, 
> cannot support global snapshot table, the global snapshot table means when 
> the lookup table is updated, it will take effective for all segments.
> To resolve the above limitations, we decide to do some improvements for the 
> existing lookup table design, below is the initial document, any comments and 
> suggestions are welcome.
> h2. Metadata
> Will add a new property in CubeDesc to describe how lookup tables will be 
> snapshot, it can be defined during the cube design
> |{{@JsonProperty}}{{(}}{{"snapshot_table_desc_list"}}{{)}}
>  {{private}} {{List snapshotTableDescList = 
> Collections.emptyList();}}|
>  SnapshotTableDesc defines how table is stored and whether it is global or 
> not, currently we can support two types of store:
>  # "metaStore",  table snapshot is stored in the metadata store, it is the 
> same as current design, and this is the default option.
>  # "hbaseStore', table snapshot is stored in an additional hbase table.
> |{{@JsonProperty}}{{(}}{{"table_name"}}{{)}}
>  {{private}} {{String tableName;}}
>   
>  {{@JsonProperty}}{{(}}{{"store_type"}}{{)}}
>  {{private}} {{String snapshotStorageType = }}{{"metaStore"}}{{;}}
>   
>  @JsonProperty("local_cache_enable")
>  private boolean enableLocalCache = true;
>   
>  {{@JsonProperty}}{{(}}{{"global"}}{{)}}
>  {{private}} {{boolean}} {{global = }}{{false}}{{;}}|
>  
> Add 'snapshots' property in CubeInstance, to store snapshots resource path 
> for each table, when the table snapshot is set to global in cube design:
> |{{@JsonProperty}}{{(}}{{"snapshots"}}{{)}}
>  {{private}} {{Map snapshots; }}{{// tableName -> 
> tableResoucePath mapping}}|
>  
> Add new meta model ExtTableSnapshot to describe the extended table snapshot 
> information, the information is stored in a new metastore path: 
> /ext_table_snapshot/\{tableName}/\{uuid}.snapshot, the metadata including 
> following info:
> |{{@JsonProperty}}{{(}}{{"tableName"}}{{)}}
>  {{private}} {{String tableName;}}
>   
>  {{@JsonProperty}}{{(}}{{"signature"}}{{)}}
>  {{private}} {{TableSignature signature;}}
>   
>  {{@JsonProperty}}{{(}}{{"storage_location_identifier"}}{{)}}
>  {{private}} {{String storageLocationIdentifier;}}
>   
>  @JsonProperty("key_columns")
>  private String[] keyColumns;  // the key columns of the table
>   
>  @JsonProperty("storage_type")
>  private String storageType;
>   
>  {{@JsonProperty}}{{(}}{{"size"}}{{)}}
>  {{private}} {{long}} {{size;}}
>   
>  {{@JsonProperty}}{{(}}{{"row_cnt"}}{{)}}
>  {{private}} {{long}} {{rowCnt;}}|
>  
> Add new section in 'Advance Setting' tab when do cube design, user can set 
> table snapshot properties for each table, and by default, it is segment level 
> and store to metadata store
> h2. Build
> If user specify 'hbaseStore' storageType for any lookup table, will use 
> MapReduce job convert the hive source table to hfiles, and then bulk load 
> hfiles to HTable. So it will add two job steps to do the lookup table 
> materialization.
> h2. HBase Lookup Table Schema
> all data are stored in raw value
> suppose the lookup table has primary keys: key1,key2
> rowkey will be:
> ||2bytes||2 bytes||len1 bytes||2 bytes||len2 bytes||
> |shard|key1 value length(len1)|key1 value|key 2 value length(len2)|key2 value|
> the first 2 bytes is shard number, HBase table can be pre-split, the shard 
> size is configurable through Kylin's properties: 
> "kylin.snapshot.ext.shard-mb", default size is 500MB.
> 1 column family c, multiple columns which column name is the index of the 
> column in the table definition
> |c|
> |1|2|...|
>  
> h2. Query
> For key lookup query, directly call hbase get api to get entire row according 
> to key (call local cache if there is local cache enable)
> For queries that need fetch keys according to the derived columns, iterate 
> all rows to get related keys. (call local cache if there is local cache 
> enable)
> For queries that only hit the lookup table, iterate all rows and let calcite 
> to do 

[jira] [Updated] (KYLIN-3221) Allow externalizing lookup table snapshot

2018-06-11 Thread Shaofeng SHI (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3221:

Summary: Allow externalizing lookup table snapshot  (was: Some improvements 
for lookup table )

> Allow externalizing lookup table snapshot
> -
>
> Key: KYLIN-3221
> URL: https://issues.apache.org/jira/browse/KYLIN-3221
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine, Metadata, Query Engine
>Reporter: Ma Gang
>Assignee: Ma Gang
>Priority: Major
> Fix For: v2.4.0
>
> Attachments: KYLIN-3221-web-error.png
>
>
> There are two limitations for current look table design:
>  # lookup table size is limited, because table snapshot need to be cached in 
> Kylin server, too large snapshot table will break the server.
>  # lookup table snapshot references are stored in all segments of the cube, 
> cannot support global snapshot table, the global snapshot table means when 
> the lookup table is updated, it will take effective for all segments.
> To resolve the above limitations, we decide to do some improvements for the 
> existing lookup table design, below is the initial document, any comments and 
> suggestions are welcome.
> h2. Metadata
> Will add a new property in CubeDesc to describe how lookup tables will be 
> snapshot, it can be defined during the cube design
> |{{@JsonProperty}}{{(}}{{"snapshot_table_desc_list"}}{{)}}
>  {{private}} {{List snapshotTableDescList = 
> Collections.emptyList();}}|
>  SnapshotTableDesc defines how table is stored and whether it is global or 
> not, currently we can support two types of store:
>  # "metaStore",  table snapshot is stored in the metadata store, it is the 
> same as current design, and this is the default option.
>  # "hbaseStore', table snapshot is stored in an additional hbase table.
> |{{@JsonProperty}}{{(}}{{"table_name"}}{{)}}
>  {{private}} {{String tableName;}}
>   
>  {{@JsonProperty}}{{(}}{{"store_type"}}{{)}}
>  {{private}} {{String snapshotStorageType = }}{{"metaStore"}}{{;}}
>   
>  @JsonProperty("local_cache_enable")
>  private boolean enableLocalCache = true;
>   
>  {{@JsonProperty}}{{(}}{{"global"}}{{)}}
>  {{private}} {{boolean}} {{global = }}{{false}}{{;}}|
>  
> Add 'snapshots' property in CubeInstance, to store snapshots resource path 
> for each table, when the table snapshot is set to global in cube design:
> |{{@JsonProperty}}{{(}}{{"snapshots"}}{{)}}
>  {{private}} {{Map snapshots; }}{{// tableName -> 
> tableResoucePath mapping}}|
>  
> Add new meta model ExtTableSnapshot to describe the extended table snapshot 
> information, the information is stored in a new metastore path: 
> /ext_table_snapshot/\{tableName}/\{uuid}.snapshot, the metadata including 
> following info:
> |{{@JsonProperty}}{{(}}{{"tableName"}}{{)}}
>  {{private}} {{String tableName;}}
>   
>  {{@JsonProperty}}{{(}}{{"signature"}}{{)}}
>  {{private}} {{TableSignature signature;}}
>   
>  {{@JsonProperty}}{{(}}{{"storage_location_identifier"}}{{)}}
>  {{private}} {{String storageLocationIdentifier;}}
>   
>  @JsonProperty("key_columns")
>  private String[] keyColumns;  // the key columns of the table
>   
>  @JsonProperty("storage_type")
>  private String storageType;
>   
>  {{@JsonProperty}}{{(}}{{"size"}}{{)}}
>  {{private}} {{long}} {{size;}}
>   
>  {{@JsonProperty}}{{(}}{{"row_cnt"}}{{)}}
>  {{private}} {{long}} {{rowCnt;}}|
>  
> Add new section in 'Advance Setting' tab when do cube design, user can set 
> table snapshot properties for each table, and by default, it is segment level 
> and store to metadata store
> h2. Build
> If user specify 'hbaseStore' storageType for any lookup table, will use 
> MapReduce job convert the hive source table to hfiles, and then bulk load 
> hfiles to HTable. So it will add two job steps to do the lookup table 
> materialization.
> h2. HBase Lookup Table Schema
> all data are stored in raw value
> suppose the lookup table has primary keys: key1,key2
> rowkey will be:
> ||2bytes||2 bytes||len1 bytes||2 bytes||len2 bytes||
> |shard|key1 value length(len1)|key1 value|key 2 value length(len2)|key2 value|
> the first 2 bytes is shard number, HBase table can be pre-split, the shard 
> size is configurable through Kylin's properties: 
> "kylin.snapshot.ext.shard-mb", default size is 500MB.
> 1 column family c, multiple columns which column name is the index of the 
> column in the table definition
> |c|
> |1|2|...|
>  
> h2. Query
> For key lookup query, directly call hbase get api to get entire row according 
> to key (call local cache if there is local cache enable)
> For queries that need fetch keys according to the derived columns, iterate 
> all rows to get related keys. (call local cache if there is local cache 
> enable)
> For queries that only hit the lookup