[jira] [Work logged] (HIVE-24163) Dynamic Partitioning Insert fail for MM table fail during MoveTask

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24163?focusedWorklogId=487658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487658
 ]

ASF GitHub Bot logged work on HIVE-24163:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 03:25
Start Date: 22/Sep/20 03:25
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on pull request #1507:
URL: https://github.com/apache/hive/pull/1507#issuecomment-696209500


   The file listing in the Utilities.getFullDPSpecs method was not correct for 
MM tables and for ACID tables when direct insert was on. This method returned 
all partitions from these tables, not just the ones affected by the current 
query. Because of this, the lineage information for inserting with dynamic 
partitioning into tables like these was not correct. Compared the lineage 
information with when inserting into external tables and for external tables 
only the partitions are present which are affected by the query. This is 
because for external tables when inserting into the table, the data first get 
written into the staging dir and when listing the partitions, this directory is 
checked and it contains only the newly inserted data. But for MM tables and 
ACID direct insert, the staging dir is missing, so it will check the table 
directory and lists everything from it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487658)
Time Spent: 0.5h  (was: 20m)

> Dynamic Partitioning Insert fail for MM table fail during MoveTask
> --
>
> Key: HIVE-24163
> URL: https://issues.apache.org/jira/browse/HIVE-24163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajkumar Singh
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> -- DDLs and Query
> {code:java}
> create table `class` (name varchar(8), sex varchar(1), age double precision, 
> height double precision, weight double precision);
> insert into table class values ('RAJ','MALE',28,12,12);
> CREATE TABLE `PART1` (`id` DOUBLE,`N` DOUBLE,`Name` VARCHAR(8),`Sex` 
> VARCHAR(1)) PARTITIONED BY(Weight string, Age
> string, Height string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' 
> LINES TERMINATED BY '\012' STORED AS TEXTFILE;
> INSERT INTO TABLE `part1` PARTITION (`Weight`,`Age`,`Height`)  SELECT 0, 0, 
> `Name`,`Sex`,`Weight`,`Age`,`Height` FROM `class`;
> {code}
> it fail during the MoveTask execution:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: partition 
> hdfs://hostname:8020/warehouse/tablespace/managed/hive/part1/.hive-staging_hive_2020-09-02_13-29-58_765_4475282758764123921-1/-ext-1/tmpstats-0_FS_3
>  is not a directory!
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2769)
>  ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2837) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 

[jira] [Work logged] (HIVE-24163) Dynamic Partitioning Insert fail for MM table fail during MoveTask

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24163?focusedWorklogId=487020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487020
 ]

ASF GitHub Bot logged work on HIVE-24163:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 15:58
Start Date: 21/Sep/20 15:58
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on pull request #1507:
URL: https://github.com/apache/hive/pull/1507#issuecomment-696209500


   The file listing in the Utilities.getFullDPSpecs method was not correct for 
MM tables and for ACID tables when direct insert was on. This method returned 
all partitions from these tables, not just the ones affected by the current 
query. Because of this, the lineage information for inserting with dynamic 
partitioning into tables like these was not correct. Compared the lineage 
information with when inserting into external tables and for external tables 
only the partitions are present which are affected by the query. This is 
because for external tables when inserting into the table, the data first get 
written into the staging dir and when listing the partitions, this directory is 
checked and it contains only the newly inserted data. But for MM tables and 
ACID direct insert, the staging dir is missing, so it will check the table 
directory and lists everything from it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487020)
Time Spent: 20m  (was: 10m)

> Dynamic Partitioning Insert fail for MM table fail during MoveTask
> --
>
> Key: HIVE-24163
> URL: https://issues.apache.org/jira/browse/HIVE-24163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajkumar Singh
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> -- DDLs and Query
> {code:java}
> create table `class` (name varchar(8), sex varchar(1), age double precision, 
> height double precision, weight double precision);
> insert into table class values ('RAJ','MALE',28,12,12);
> CREATE TABLE `PART1` (`id` DOUBLE,`N` DOUBLE,`Name` VARCHAR(8),`Sex` 
> VARCHAR(1)) PARTITIONED BY(Weight string, Age
> string, Height string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' 
> LINES TERMINATED BY '\012' STORED AS TEXTFILE;
> INSERT INTO TABLE `part1` PARTITION (`Weight`,`Age`,`Height`)  SELECT 0, 0, 
> `Name`,`Sex`,`Weight`,`Age`,`Height` FROM `class`;
> {code}
> it fail during the MoveTask execution:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: partition 
> hdfs://hostname:8020/warehouse/tablespace/managed/hive/part1/.hive-staging_hive_2020-09-02_13-29-58_765_4475282758764123921-1/-ext-1/tmpstats-0_FS_3
>  is not a directory!
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2769)
>  ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2837) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 

[jira] [Work logged] (HIVE-24163) Dynamic Partitioning Insert fail for MM table fail during MoveTask

2020-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24163?focusedWorklogId=486282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-486282
 ]

ASF GitHub Bot logged work on HIVE-24163:
-

Author: ASF GitHub Bot
Created on: 18/Sep/20 16:59
Start Date: 18/Sep/20 16:59
Worklog Time Spent: 10m 
  Work Description: kuczoram opened a new pull request #1507:
URL: https://github.com/apache/hive/pull/1507


   … MoveTask
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 486282)
Remaining Estimate: 0h
Time Spent: 10m

> Dynamic Partitioning Insert fail for MM table fail during MoveTask
> --
>
> Key: HIVE-24163
> URL: https://issues.apache.org/jira/browse/HIVE-24163
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Rajkumar Singh
>Assignee: Marta Kuczora
>Priority: Major
> Fix For: 3.1.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> -- DDLs and Query
> {code:java}
> create table `class` (name varchar(8), sex varchar(1), age double precision, 
> height double precision, weight double precision);
> insert into table class values ('RAJ','MALE',28,12,12);
> CREATE TABLE `PART1` (`id` DOUBLE,`N` DOUBLE,`Name` VARCHAR(8),`Sex` 
> VARCHAR(1)) PARTITIONED BY(Weight string, Age
> string, Height string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' 
> LINES TERMINATED BY '\012' STORED AS TEXTFILE;
> INSERT INTO TABLE `part1` PARTITION (`Weight`,`Age`,`Height`)  SELECT 0, 0, 
> `Name`,`Sex`,`Weight`,`Age`,`Height` FROM `class`;
> {code}
> it fail during the MoveTask execution:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: partition 
> hdfs://hostname:8020/warehouse/tablespace/managed/hive/part1/.hive-staging_hive_2020-09-02_13-29-58_765_4475282758764123921-1/-ext-1/tmpstats-0_FS_3
>  is not a directory!
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2769)
>  ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2837) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.exec.MoveTask.handleDynParts(MoveTask.java:562) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:440) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) 
> ~[hive-exec-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  ~[hive-service-3.1.3000.7.2.0.0-237.jar:3.1.3000.7.2.0.0-237]
> {code}
> The reason is Task write the fsstat during the FileSinkOperator closing, HS2 
> ran the MoveTask to move data into the destination partition directory, while