[ 
https://issues.apache.org/jira/browse/CARBONDATA-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4322.
------------------------------------
    Fix Version/s: 2.3.0
       Resolution: Fixed

> Insert into local sort partition table select * from text table launch 
> thousands tasks
> --------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-4322
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4322
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: SHREELEKHYA GAMPA
>            Priority: Major
>             Fix For: 2.3.0
>
>          Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> [Reproduce steps]
>  # CREATE TABLE partitionthree1 (empno int, doj Timestamp, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int, empname String, designation String) PARTITIONED 
> BY (workgroupcategory int) STORED AS carbondata 
> tblproperties('sort_scope'='local_sort', 'sort_columns'='deptname,empname');
>  # CREATE TABLE partitionthree2 (empno int, doj Timestamp, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int, empname String, designation String) PARTITIONED 
> BY (workgroupcategory int);
>  # LOAD DATA local inpath 'hdfs://hacluster/user/data.csv' INTO TABLE 
> partitionthree1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"', 
> 'TIMESTAMPFORMAT'='dd-MM-yyyy');
>  # set hive.exec.dynamic.partition.mode=nonstrict;
>  # insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
> insert into partitionthree2 select * from partitionthree1;
>  # insert into partitionthree1 select * from partitionthree2;
>  
> [Expect Result]
> Step 6 only launches number of tasks equal to number of nodes.
>  
> [Current Behavior]
> Number of tasks far larger than number of nodes.
>  
> [Impact]
> In several product sites, query performance get impact significantly.
>  
> [Initial analysis]
> Insert into non partition local sort table will launch number of tasks equal 
> to number of nodes, make partition table the same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to