[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-17113: -- Labels: (was: TODOC3.0) > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Fix For: 3.0.0 > > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, > HIVE-17113.3.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-17113: -- Labels: TODOC3.0 (was: ) > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, > HIVE-17113.3.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Fix For: 3.0.0 > > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, > HIVE-17113.3.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Attachment: HIVE-17113.3.patch > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, > HIVE-17113.3.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Status: Patch Available (was: Open) > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Attachment: HIVE-17113.2.patch > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Status: Open (was: Patch Available) > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Status: Patch Available (was: Open) > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task
[ https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17113: -- Attachment: HIVE-17113.1.patch Patch to switch the order of file operations during Utilities.mvFileToFinalPath() - move the temp directory to the final location first, then remove duplicate bucket files. [~ashutoshc] can you take a look? [~rajesh.balamohan] FYI this may undo some of the file operation optimization you did in HIVE-14323. > Duplicate bucket files can get written to table by runaway task > --- > > Key: HIVE-17113 > URL: https://issues.apache.org/jira/browse/HIVE-17113 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-17113.1.patch > > > Saw a table get a duplicate bucket file from a Hive query. It looks like the > following happened: > 1. Task attempt A_0 starts,but then stops making progress > 2. The job was running with speculative execution on, and task attempt A_1 is > started > 3. Task attempt A_1 finishes execution and saves its output to the temp > directory. > 5. A task kill is sent to A_0, though this does appear to actually kill A_0 > 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls > Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files > 7. A_0 (still running) finally finishes and saves its file to the temp > directory. At this point we now have duplicate bucket files - oops! > 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the > final location, where it is later moved to the partition directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)