Krzysztof Zarzycki created HIVE-21170: -----------------------------------------
Summary: Wrong (no) results of cross-product query executed on LLAP Key: HIVE-21170 URL: https://issues.apache.org/jira/browse/HIVE-21170 Project: Hive Issue Type: Bug Environment: Hive distribution: HDP 3.1.0. LLAP execution engine in version: {code:java} $ beeline --version SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive 3.1.0.3.1.0.0-78 Git git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive -r 56673b027117d8cb3400675b1680a4d992360808 Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018 >From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc{code} Reporter: Krzysztof Zarzycki *In the environment:* \{code} $ beeline --version SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive 3.1.0.3.1.0.0-78 Git git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive -r 56673b027117d8cb3400675b1680a4d992360808 Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018 >From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc {code} *On LLAP execution engine*, the following query gives *wrong results*: {code} -- prepare test data set hive.query.results.cache.enabled=false; create table test1 (id int); insert into test1 values (1),(2),(3); -- query select * from test1 t1 cross join test1 t2; {code} *Query result:* {code} 0: jdbc:hive2://hostname:> select t1.* from test1 t1 cross join test1 t2; INFO : Compiling command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6): select t1.* from test1 t1 cross join test1 t2 INFO : Warning: Shuffle Join MERGEJOIN[9][tables = [$hdt$_0, $hdt$_1]] in Stage 'Reducer 2' is a cross product INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t1.id, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6); Time taken: 0.229 seconds INFO : Executing command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6): select t1.* from test1 t1 cross join test1 t2 INFO : Query ID = hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in parallel ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 1 1 0 0 0 0 Map 3 .......... llap SUCCEEDED 1 1 0 0 0 0 Reducer 2 llap SUCCEEDED 0 0 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/03 [==========================>>] 100% ELAPSED TIME: 1.92 s ---------------------------------------------------------------------------------------------- INFO : Completed executing command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6); Time taken: 2.006 seconds INFO : OK +--------+ | t1.id | +--------+ +--------+ No rows selected (2.284 seconds) {code} *Expected result:* {code} +--------+ | t1.id | +--------+ | 3 | | 3 | | 3 | | 2 | | 2 | | 2 | | 1 | | 1 | | 1 | +--------+ 9 rows selected {code} *What worked as a workaround:* {\{set hive.tez.cartesian-product.enabled=false;}} (default true) Then query gave a correct result. *Difference in execution plans :* 1. With \{{set hive.tez.cartesian-product.enabled=true;}}: {code} +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (XPROD_EDGE), Map 3 (XPROD_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 llap | | File Output Operator [FS_8] | | Merge Join Operator [MERGEJOIN_9] (rows=9 width=8) | | Conds:(Inner),Output:["_col0","_col1"] | | <-Map 1 [XPROD_EDGE] vectorized, llap | | XPROD_EDGE [RS_11] | | Select Operator [SEL_10] (rows=3 width=4) | | Output:["_col0"] | | TableScan [TS_0] (rows=3 width=4) | | zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] | | <-Map 3 [XPROD_EDGE] vectorized, llap | | XPROD_EDGE [RS_13] | | Select Operator [SEL_12] (rows=3 width=4) | | Output:["_col0"] | | TableScan [TS_2] (rows=3 width=4) | | zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] | | | +----------------------------------------------------+ {code} With \{{set hive.tez.cartesian-product.enabled=false;}}: {code} +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 3 (CUSTOM_SIMPLE_EDGE) | | | | Stage-0 | | Fetch Operator | | limit:-1 | | Stage-1 | | Reducer 2 llap | | File Output Operator [FS_8] | | Merge Join Operator [MERGEJOIN_9] (rows=9 width=8) | | Conds:(Inner),Output:["_col0","_col1"] | | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap | | PARTITION_ONLY_SHUFFLE [RS_11] | | Select Operator [SEL_10] (rows=3 width=4) | | Output:["_col0"] | | TableScan [TS_0] (rows=3 width=4) | | zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] | | <-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap | | PARTITION_ONLY_SHUFFLE [RS_13] | | Select Operator [SEL_12] (rows=3 width=4) | | Output:["_col0"] | | TableScan [TS_2] (rows=3 width=4) | | zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] | | | +----------------------------------------------------+ {code} So I suspect some bug in XPROD_EDGE component executed on LLAP. *Note 1: Same query on Hive on Tez works as expected.* *Note 2: What I also tried but didn't help:*\{code} set hive.vectorized.execution.enabled=false; (default true) {code} That didn't change the result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)