[ https://issues.apache.org/jira/browse/CARBONDATA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-1438. ---------------------------------- Resolution: Fixed Fix Version/s: 1.2.0 > Unify the sort column and sort scope in create table command > ------------------------------------------------------------ > > Key: CARBONDATA-1438 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1438 > Project: CarbonData > Issue Type: Improvement > Reporter: chenerlu > Fix For: 1.2.0 > > Time Spent: 14h 40m > Remaining Estimate: 0h > > 1 Requirement > Currently, Users can specify sort column in table properties when create > table. And when load data, users can also specify sort scope in load options. > In order to improve the ease of use for users, it will be better to specify > the sort related parameters all in create table command. > Once sort scope is specified in create table command, it will be used in load > data even users have specified in load options. > 2 Detailed design > 2.1 Task-01 > Requirement: Create table can support specify sort scope > Implement: Take use of table properties (Map<String, String>), will specify > sort scope in table properties by key/value pair, then existing interface > will be called to write this key/value pair into metastore. > Will support Global Sort,Local Sort and No Sort,it can be specified in sql > command: > CREATE TABLE tableWithGlobalSort ( > shortField SHORT, > intField INT, > bigintField LONG, > doubleField DOUBLE, > stringField STRING, > timestampField TIMESTAMP, > decimalField DECIMAL(18,2), > dateField DATE, > charField CHAR(5) > ) > STORED BY 'carbondata' > TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT') > > Tips:If the sort scope is global Sort, users should specify > GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of > map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is > [1,Integer.MaxValue],it is only used when the sort scope is global sort. > Global Sort Use orderby operator in spark, data is ordered in segment level. > Local Sort Node ordered, carbondata file is ordered if it is written by > one task. > No Sort No sort > Tips:key and value is case-insensitive. > 2.2 Task-02 > Requirement: > Load data in will support local sort, no sort, global sort > Ignore the sort scope specified in load data and use the parameter which > specified in create table. > Currently, user can specify the sort scope and global sort partitions in load > options, After modification, it will ignore the sort scope which specified in > load options and will get sort scope from table properties. > Current logic: sort scope is from load options > Number Prerequisite Sort scope > 1 isSortTable is true && Sort Scope is Global Sort Global > Sort(first check) > 2 isSortTable is false No Sort > 3 isSortTable is true Local Sort > Tips: isSortTable is true means this table contains sort column or it > contains dimensions (except complex type), like string type. > For example: > Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table > Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort > table > Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties > (‘sort_column’=’col1’) –- sort table > New logic:sort scope is from create table > Number Prerequisite Code branch > 1 isSortTable = true && Sort Scope is Global Sort Global Sort(first check) > 2 isSortTable= false || Sort Scope is No Sort No Sort > 3 isSortTable is true && Sort Scope is Local Sort Local Sort > 4 isSortTable is true,without specify Sort Scope Local Sort, (Keep > current logic) > 3 Acceptance standard > Number Acceptance standard > 1 Use can specify sort scope(global, local, no sort) when create carbon > table in sql type > 2 Load data will ignore the sort scope specified in load options and will > use the parameter which specify in create table command. If user still > specify the sort scope in load options, will give warning and inform user > that he will use the sort scope which specified in create table. > 4 Feature restrictions > NA > 5 Dependencies > NA > 6 Technical risk > NA -- This message was sent by Atlassian JIRA (v6.4.14#64029)