Krzysztof Zarzycki created HIVE-21170:
-----------------------------------------

             Summary: Wrong (no) results of cross-product query executed on 
LLAP 
                 Key: HIVE-21170
                 URL: https://issues.apache.org/jira/browse/HIVE-21170
             Project: Hive
          Issue Type: Bug
         Environment: Hive distribution: HDP 3.1.0. 

LLAP execution engine in version:
{code:java}
$ beeline --version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive 3.1.0.3.1.0.0-78
Git 
git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
 -r 56673b027117d8cb3400675b1680a4d992360808
Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
>From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc{code}
 
            Reporter: Krzysztof Zarzycki


*In the environment:* \{code}
$ beeline --version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive 3.1.0.3.1.0.0-78
Git 
git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
 -r 56673b027117d8cb3400675b1680a4d992360808
Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
>From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc
{code}

*On LLAP execution engine*, the following query gives *wrong results*:
{code}
-- prepare test data
set hive.query.results.cache.enabled=false;
create table test1 (id int);
insert into test1 values (1),(2),(3);

-- query
select * from test1 t1 cross join test1 t2;
{code}

*Query result:*
{code}
0: jdbc:hive2://hostname:> select t1.* from test1 t1 cross join test1 t2;
INFO : Compiling 
command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6): 
select t1.* from test1 t1 cross join test1 t2
INFO : Warning: Shuffle Join MERGEJOIN[9][tables = [$hdt$_0, $hdt$_1]] in Stage 
'Reducer 2' is a cross product
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t1.id, 
type:int, comment:null)], properties:null)
INFO : Completed compiling 
command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6); Time 
taken: 0.229 seconds
INFO : Executing 
command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6): 
select t1.* from test1 t1 cross join test1 t2
INFO : Query ID = hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in parallel
----------------------------------------------------------------------------------------------
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... llap SUCCEEDED 1 1 0 0 0 0
Map 3 .......... llap SUCCEEDED 1 1 0 0 0 0
Reducer 2 llap SUCCEEDED 0 0 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/03 [==========================>>] 100% ELAPSED TIME: 1.92 s
----------------------------------------------------------------------------------------------
INFO : Completed executing 
command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6); Time 
taken: 2.006 seconds
INFO : OK
+--------+
| t1.id |
+--------+
+--------+
No rows selected (2.284 seconds)
{code}

*Expected result:*
{code}
+--------+
| t1.id |
+--------+
| 3 |
| 3 |
| 3 |
| 2 |
| 2 |
| 2 |
| 1 |
| 1 |
| 1 |
+--------+
9 rows selected
{code}

*What worked as a workaround:*
{\{set hive.tez.cartesian-product.enabled=false;}} (default true)
Then query gave a correct result.

*Difference in execution plans :*
1. With \{{set hive.tez.cartesian-product.enabled=true;}}: 
{code}
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (XPROD_EDGE), Map 3 (XPROD_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 llap |
| File Output Operator [FS_8] |
| Merge Join Operator [MERGEJOIN_9] (rows=9 width=8) |
| Conds:(Inner),Output:["_col0","_col1"] |
| <-Map 1 [XPROD_EDGE] vectorized, llap |
| XPROD_EDGE [RS_11] |
| Select Operator [SEL_10] (rows=3 width=4) |
| Output:["_col0"] |
| TableScan [TS_0] (rows=3 width=4) |
| zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
| <-Map 3 [XPROD_EDGE] vectorized, llap |
| XPROD_EDGE [RS_13] |
| Select Operator [SEL_12] (rows=3 width=4) |
| Output:["_col0"] |
| TableScan [TS_2] (rows=3 width=4) |
| zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
| |
+----------------------------------------------------+
{code}

With \{{set hive.tez.cartesian-product.enabled=false;}}: 
{code}
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 3 (CUSTOM_SIMPLE_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 llap |
| File Output Operator [FS_8] |
| Merge Join Operator [MERGEJOIN_9] (rows=9 width=8) |
| Conds:(Inner),Output:["_col0","_col1"] |
| <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap |
| PARTITION_ONLY_SHUFFLE [RS_11] |
| Select Operator [SEL_10] (rows=3 width=4) |
| Output:["_col0"] |
| TableScan [TS_0] (rows=3 width=4) |
| zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
| <-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap |
| PARTITION_ONLY_SHUFFLE [RS_13] |
| Select Operator [SEL_12] (rows=3 width=4) |
| Output:["_col0"] |
| TableScan [TS_2] (rows=3 width=4) |
| zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
| |
+----------------------------------------------------+
{code}

So I suspect some bug in XPROD_EDGE component executed on LLAP.

*Note 1: Same query on Hive on Tez works as expected.*

*Note 2: What I also tried but didn't help:*\{code}
set hive.vectorized.execution.enabled=false; (default true)
{code}
That didn't change the result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to