[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-08-03 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112279#comment-16112279
 ] 

Lefty Leverenz commented on HIVE-17113:
---

Doc note:  This adds *hive.exec.move.files.from.source.dir* to HiveConf.java, 
so it needs to be documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC3.0 label.

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-31 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107703#comment-16107703
 ] 

Ashutosh Chauhan commented on HIVE-17113:
-

+1

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-31 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107690#comment-16107690
 ] 

Jason Dere commented on HIVE-17113:
---

[~ashutoshc] can you review this one?

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102718#comment-16102718
 ] 

Hive QA commented on HIVE-17113:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879072/HIVE-17113.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11012 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6142/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6142/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6142/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879072 - PreCommit-HIVE-Build

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-25 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100951#comment-16100951
 ] 

Jason Dere commented on HIVE-17113:
---

Spoke offline to [~ashutoshc], who recommended the following approach:
- During Utilities.removeTempOrDuplicateFiles(), maintain a list of files 
found/deduped. This list of files will be used to determine which files are 
moved to the destination directory.
- A configurable setting will be added here to control whether this file list 
will be used to control which files will be moved, or if the existing behavior 
will be used.

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-25 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100564#comment-16100564
 ] 

Jason Dere commented on HIVE-17113:
---

Looks like in the case of skewjoin in Spark, there can be multiple jobs which 
copy files into the same temp directory. When this happens, there can be name 
collisions - in the test there are collisions on files 00_0 and 01_0, 
which get renamed to 00_0_1 and 01_0_1. Since the 
removeTempOrDuplicateFiles() is now being called on the destination directory, 
it's not able to correctly disambiguate the 00_0_1, 01_0_1 files.

Since it looks like the destination directory can potentially hold results from 
more than one job, it does not seem to be correct to simply run 
removeTempOrDuplicateFiles() on the destination directory. Maybe we have to 
change the logic to the following:
1) Move the temp directory to a new directory name, to prevent additional files 
from being added by any runaway processes.
2) Run removeTempOrDuplicateFiles() on this renamed temp directory
3) Run renameOrMoveFiles() to move the renamed temp directory to the final 
location.

Though step 1 might be expensive for cloud storage (basically means performing 
twice the file moves right?) .. [~ashutoshc] should doing step 1 be a 
configurable setting?

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092149#comment-16092149
 ] 

Jason Dere commented on HIVE-17113:
---

Seems to be causing a failure in TestSparkCliDriver skewjoin.q

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-17 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091048#comment-16091048
 ] 

Hive QA commented on HIVE-17113:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877705/HIVE-17113.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11065 tests 
executed
*Failed tests:*
{noformat}
TestSSL - did not produce a TEST-*.xml file (likely timed out) (batchId=224)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=238)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_2]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_ts_stats_for_mapjoin]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=167)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=233)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=233)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[skewjoin] 
(batchId=110)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=178)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6070/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6070/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6070/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877705 - PreCommit-HIVE-Build

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-17 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090666#comment-16090666
 ] 

Jason Dere commented on HIVE-17113:
---

Talked to [~ashutoshc] and [~sseth] about this. According to Sid this is 
normally handled in MR using the OutputCommitter. However Ashutosh mentioned 
that Hive does not use the Hadoop OutputCommitter functionality and instead 
tries to handle duplicate task attempts by itself - thus the call to 
Utilities.removeTempOrDuplicateFiles().

A couple of solutions to this on the Hive side:
1) Changing Hive to properly use the OutputCommitter
2) Utiltiies.mvFileToFinalPath() should call 
Utilities.removeTempOrDuplicateFiles() after renaming the temp directory rather 
than before renaming. This is basically swapping the order of steps 6 and 8 in 
the Jira description, within Utilities.mvFileToFinalPath().

Gonna try to do option 2 as it looks like a simpler fix.

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)