[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888524#comment-15888524 ] Deepak Jaiswal commented on HIVE-16022: --- As Jason mentioned, the check was added because semijoin reduction does not work properly with partition column, the reason being, it is not a real column but a logical one. The logic expects a real column to compute min and max. Aslo, there can never be parallel DPP and semijoin branches by design as they are created in if ...else if... block. > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Fix For: 2.2.0 > > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch, HIVE-16022.4.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887403#comment-15887403 ] Hive QA commented on HIVE-16022: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12855035/HIVE-16022.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10270 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table] (batchId=147) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=230) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3830/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3830/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3830/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12855035 - PreCommit-HIVE-Build > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch, HIVE-16022.4.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887019#comment-15887019 ] Gunther Hagleitner commented on HIVE-16022: --- +1 > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch, HIVE-16022.4.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886867#comment-15886867 ] Gunther Hagleitner commented on HIVE-16022: --- Looks good overall. Can you explain why you need to recursively find another column desc? I think you're looking for whether any partition column contributed to the expr you're looking at. That doesn't seem to be the right check either. Does it? A better check would be to make sure there are no parallel dyn part and semi join branches, wouldn't it? > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886320#comment-15886320 ] Eugene Koifman commented on HIVE-16022: --- the new plans look good but I don't know enough about the optimizer part to review this > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886314#comment-15886314 ] Jason Dere commented on HIVE-16022: --- Precommit tests looked good. [~hagleitn] [~ekoifman] can you review? > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884144#comment-15884144 ] Hive QA commented on HIVE-16022: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12854609/HIVE-16022.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10246 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=117) org.apache.hadoop.hive.metastore.TestMetaStoreInitListener.testMetaStoreInitListener (batchId=203) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3775/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3775/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3775/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12854609 - PreCommit-HIVE-Build > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch, > HIVE-16022.3.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881719#comment-15881719 ] Hive QA commented on HIVE-16022: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12854159/HIVE-16022.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10258 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=136) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_partition_pruning] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning] (batchId=144) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=211) org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite (batchId=186) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3736/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3736/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3736/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12854159 - PreCommit-HIVE-Build > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880126#comment-15880126 ] Jason Dere commented on HIVE-16022: --- [~hagleitn] [~ashutoshc] [~ekoifman] can you review > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch, HIVE-16022.2.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880079#comment-15880079 ] Jason Dere commented on HIVE-16022: --- Looking at the partition column checking logic in generateSemiJoinOperatorPlan(), I still don't think it is right. The loop to try to get the TableScanOperator will blindly follow the 1st parent link, even if this means passing through a RS boundary. I think a better check here is to check that the key is a column expr node, and then checking getIsPartitionColOrVirtualCol() on the column to see if it is a partition column. > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16022) BloomFilter check not showing up in MERGE statement queries
[ https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1587#comment-1587 ] Jason Dere commented on HIVE-16022: --- Noticed a couple of problems when I run the semijoin optimization on a MERGE statement: - DynamicPartitionPruningOptimization.generateSemiJoinOperator(): parentOfRS does not necessarily have to be a SelectOperator - in this case it is a TS. As a result we are missing some important checking on whether this table is appropriate for semijoin opt. - grandParent.getChildren().add(bloomFilterNode) - This wrongly assumes grandParent is AND: In this case, there was no previous filterExpr so grandParent is BETWEEN. Adding the child here incorrectly adds a new parameter to BETWEEN , which is probably getting ignored. This is why in_bloom_filter() is not in the EXPLAIN. > BloomFilter check not showing up in MERGE statement queries > --- > > Key: HIVE-16022 > URL: https://issues.apache.org/jira/browse/HIVE-16022 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16022.1.patch > > > Running explain on a MERGE statement with runtime filtering enabled, I see > the min/max being applied on the large table, but not the bloom filter check: > {noformat} > explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a > WHEN MATCHED AND s.a > 8 THEN DELETE > WHEN MATCHED THEN UPDATE SET b = 7 > WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b) > ... > Map 1 > Map Operator Tree: > TableScan > alias: t > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND > DynamicValue(RS_3_s_a_max) (type: boolean) > Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL > Column stats: NONE > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)