[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239362#comment-16239362 ] Lefty Leverenz commented on HIVE-17458: --- No doc needed: This changes the description of *hive.txn.operational.properties* but it doesn't need to be documented because it's for internal use only. (See HIVE-14035 comments, 21-22 Aug. 2016.) > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch, > HIVE-17458.16.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239103#comment-16239103 ] Eugene Koifman commented on HIVE-17458: --- llap_fast_acid is an unstable test, all others have age > 1 > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch, > HIVE-17458.16.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238798#comment-16238798 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12895889/HIVE-17458.16.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11355 tests executed *Failed tests:* {noformat} TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file (likely timed out) (batchId=231) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=102) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAmPoolInteractions (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanUserMapping (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAsyncSessionInitFailures (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testClusterFractions (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testDestroyAndReturn (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testQueueing (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReopen (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuse (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithDifferentPool (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithQueueing (batchId=281) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7628/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7628/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7628/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12895889 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch, > HIVE-17458.16.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238269#comment-16238269 ] Sergey Shelukhin commented on HIVE-17458: - +1 pending tests... formatting nits can be fixed on commit > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch, > HIVE-17458.16.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238233#comment-16238233 ] Eugene Koifman commented on HIVE-17458: --- correct. It tests for whether ROW__ID needs to be projected (implicitly or explicitly) or there are delete events to apply. If none of those, it doesn't decorate the rows with ROW__ID. > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch, > HIVE-17458.16.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238213#comment-16238213 ] Sergey Shelukhin commented on HIVE-17458: - For #2, but it isn't necessary if you are running a select, right? Mostly looks good > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch, > HIVE-17458.16.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236716#comment-16236716 ] Eugene Koifman commented on HIVE-17458: --- 1. acid 2.0 is the only acid supported by hive 3.0. The old data is still readable after the upgrade process from 2.x to 3.0. 2. it is necessary. say you are running an update on a table converted to acid but not having gone through major compaction. > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236704#comment-16236704 ] Sergey Shelukhin commented on HIVE-17458: - Left some comments. My main 2 qs are 1) A patch mentions that non-split-update ACID cannot be read in Hive3. Wouldn't that mean all the legacy ACID data cannot be read? Reader compat should still be possible. 2) If there are originals only with no deltas, does it still activate the row id machinery? Looks like it should be unnecessary. > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch, HIVE-17458.15.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234292#comment-16234292 ] Eugene Koifman commented on HIVE-17458: --- [~sershe] there is 1 minor related failure but it's ready for review > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234254#comment-16234254 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12895204/HIVE-17458.14.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11339 tests executed *Failed tests:* {noformat} TestOperationLoggingAPIWithMr - did not produce a TEST-*.xml file (likely timed out) (batchId=227) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testVectorizedOrcAcidRowBatchReader (batchId=266) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7585/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7585/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7585/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12895204 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234135#comment-16234135 ] Eugene Koifman commented on HIVE-17458: --- patch14 for build (same as 13) > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch, HIVE-17458.14.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227700#comment-16227700 ] Eugene Koifman commented on HIVE-17458: --- Patch 13 restores OrcSplit.canUseLlapIo() as before, which means reading "original" files will Vectorize but not use LLAP IO. There are various issues with that which are reflected in subtasks above. These can be handled at a later point. > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch, > HIVE-17458.13.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227227#comment-16227227 ] Eugene Koifman commented on HIVE-17458: --- I'll make RB once I fix some of the issues here > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch, HIVE-17458.12.patch, HIVE-17458.12.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226095#comment-16226095 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894858/HIVE-17458.12.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 11324 tests executed *Failed tests:* {noformat} TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=173) [infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,index_bitmap_auto.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,orc_merge3.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization_original] (batchId=77) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[bucket5] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge10] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge1] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[reduce_deduplicate] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[tez_union_dynamic_partition_2] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket4] (batchId=175) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucket5] (batchId=175) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[disable_merge_for_bucketing] (batchId=176) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge2] (batchId=176) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge4] (batchId=175) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge5] (batchId=174) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge6] (batchId=174) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge7] (batchId=176) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[reduce_deduplicate] (batchId=176) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[acid_vectorization_original_tez] (batchId=102) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=102) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_dyn_part] (batchId=89) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_map_operators] (batchId=89) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.metastore.security.TestHadoopAuthBridge23.testSaslWithHiveMetaStore (batchId=236) org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testVectorizedOrcAcidRowBatchReader (batchId=266) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=230) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=230) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7562/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7562/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7562/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 33 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12894858 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files >
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225804#comment-16225804 ] Sergey Shelukhin commented on HIVE-17458: - [~ekoifman] can you post a RB? > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch, > HIVE-17458.11.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223419#comment-16223419 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12894449/HIVE-17458.11.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 11343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=78) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_globallimit] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets] (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_vectorization_original] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_non_partitioned] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_tmp_table] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_no_match] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_non_partitioned] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_where_partitioned] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_whole_partition] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization_acid] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_update_delete] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_dynamic_partitioned] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_non_partitioned] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_partitioned] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_tmp_table] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_after_multiple_inserts] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_non_partitioned] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_partitioned] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_types] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_tmp_table] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_two_cols] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_no_match] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_non_partitioned] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_where_partitioned] (batchId=160) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=93) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testCancelRenewTokenFlow (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testConnection (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValid (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testIsValidNeg (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeProxyAuth (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testNegativeTokenAuth (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testProxyAuth (batchId=243) org.apache.hive.minikdc.TestJdbcWithDBTokenStoreNoDoAs.testTokenAuth
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222604#comment-16222604 ] Eugene Koifman commented on HIVE-17458: --- [~sershe] could you review please > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch, HIVE-17458.10.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221467#comment-16221467 ] Eugene Koifman commented on HIVE-17458: --- HIVE-12631 is a prerequisite > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219959#comment-16219959 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12893964/HIVE-17458.09.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11328 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_vectorization_original] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[acid_vectorization_original] (batchId=101) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7487/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7487/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7487/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12893964 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch, > HIVE-17458.08.patch, HIVE-17458.09.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217965#comment-16217965 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12893811/HIVE-17458.07.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11320 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_vectorization_original] (batchId=164) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=156) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=110) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=205) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=222) org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch (batchId=270) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7457/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7457/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7457/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12893811 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch, > HIVE-17458.06.patch, HIVE-17458.07.patch, HIVE-17458.07.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215870#comment-16215870 ] Eugene Koifman commented on HIVE-17458: --- On disabling LLAP cache {noformat} [2:07 PM] Sergey Shelukhin: OrcSplit.canUseLlapIo() [2:07 PM] Sergey Shelukhin: in general, LlapAwareSplit [2:07 PM] Sergey Shelukhin: is the cleanest way [2:09 PM] Sergey Shelukhin: LlapRecordReader.create() is another place where one could check, on lower level [2:09 PM] Sergey Shelukhin: and return null {noformat} > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch, HIVE-17458.05.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214087#comment-16214087 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12893377/HIVE-17458.04.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11317 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=158) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=204) org.apache.hadoop.hive.common.metrics.metrics2.TestCodahaleMetrics.testFileReporting (batchId=251) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02 (batchId=282) org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testCanCreateVectorizedAcidRowBatchReaderOnSplit (batchId=264) org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testVectorizedOrcAcidRowBatchReader (batchId=264) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=221) org.apache.hadoop.hive.ql.parse.authorization.plugin.sqlstd.TestOperation2Privilege.checkHiveOperationTypeMatch (batchId=269) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=228) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7432/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7432/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7432/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12893377 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213578#comment-16213578 ] Eugene Koifman commented on HIVE-17458: --- patch 4 adds support for delete events need a test with multiple stripes, ppd, etc - make sure ids assigned correctly test make sure compactor assigns them the same way disable LLAP cache for original read that needs ROW__IDs > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch, HIVE-17458.02.patch, > HIVE-17458.03.patch, HIVE-17458.04.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17458) VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208740#comment-16208740 ] Hive QA commented on HIVE-17458: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12892718/HIVE-17458.01.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 11278 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan] (batchId=163) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=110) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_notin] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_scalar] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_select] (batchId=119) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_views] (batchId=108) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query16] (batchId=243) org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query94] (batchId=243) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=241) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query16] (batchId=241) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=241) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query94] (batchId=241) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=204) org.apache.hadoop.hive.ql.TestTxnNoBuckets.testNonAcidToAcidVectorzied (batchId=272) org.apache.hive.jdbc.TestTriggersTezSessionPoolManager.testTriggerHighShuffleBytes (batchId=229) org.apache.hive.jdbc.TestTriggersWorkloadManager.testTriggerHighShuffleBytes (batchId=229) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7359/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7359/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7359/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12892718 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-17458.01.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)