Khurram Faraaz created DRILL-6141: ------------------------------------- Summary: JOIN query that uses USING clause returns incorrect results Key: DRILL-6141 URL: https://issues.apache.org/jira/browse/DRILL-6141 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.12.0 Reporter: Khurram Faraaz
Join query that uses USING clause returns incorrect results. Postgres 9.2.23 returns only one occurrence of the "id" column {noformat} postgres=# create table t1(id int, name varchar(30)); CREATE TABLE postgres=# create table t2(id int, name varchar(30)); CREATE TABLE postgres=# select * from t1; id | name ----+------- 10 | John 13 | Kevin 15 | Susan (3 rows) postgres=# select * from t2; id | name ----+------- 19 | Kyle 13 | Kevin 1 | Bob 17 | Kumar (4 rows) postgres=# select * from t1 JOIN t2 USING(id); id | name | name ----+-------+------- 13 | Kevin | Kevin (1 row) {noformat} results from Drill 1.12.0-mapr commit : 2de42491be795721bcb4059bd46e27fc33272309 {noformat} 0: jdbc:drill:schema=dfs.tmp> create table t1 as select cast(columns[0] as int) c1, cast(columns[1] as varchar(30)) c2 from `t1.csv`; +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 0_0 | 3 | +-----------+----------------------------+ 1 row selected (0.213 seconds) 0: jdbc:drill:schema=dfs.tmp> create table t2 as select cast(columns[0] as int) c1, cast(columns[1] as varchar(30)) c2 from `t2.csv`; +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 0_0 | 4 | +-----------+----------------------------+ 1 row selected (0.168 seconds) 0: jdbc:drill:schema=dfs.tmp> select * from t1; +-----+--------+ | c1 | c2 | +-----+--------+ | 10 | John | | 13 | Kevin | | 15 | Susan | +-----+--------+ 3 rows selected (0.15 seconds) 0: jdbc:drill:schema=dfs.tmp> select * from t2; +-----+--------+ | c1 | c2 | +-----+--------+ | 19 | Kyle | | 13 | Kevin | | 1 | Bob | | 17 | Kumar | +-----+--------+ 4 rows selected (0.171 seconds) ## Note that Drill returns an extra column, unlike Postgres, for the same query over same data 0: jdbc:drill:schema=dfs.tmp> select * from t1 JOIN t2 USING(c1); +-----+--------+------+--------+ | c1 | c2 | c10 | c20 | +-----+--------+------+--------+ | 13 | Kevin | 13 | Kevin | +-----+--------+------+--------+ 1 row selected (0.256 seconds) ## explain plan for above query 0: jdbc:drill:schema=dfs.tmp> explain plan for select * from t1 JOIN t2 USING(c1); +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 ProjectAllowDup(*=[$0], *0=[$1]) 00-02 Project(T49¦¦*=[$0], T48¦¦*=[$2]) 00-03 Project(T49¦¦*=[$2], c10=[$3], T48¦¦*=[$0], c1=[$1]) 00-04 HashJoin(condition=[=($3, $1)], joinType=[inner]) 00-06 Project(T48¦¦*=[$0], c1=[$1]) 00-08 Scan(table=[[dfs, tmp, t2]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t2]], selectionRoot=maprfs:/tmp/t2, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`*`]]]) 00-05 Project(T49¦¦*=[$0], c10=[$1]) 00-07 Project(T49¦¦*=[$0], c1=[$1]) 00-09 Scan(table=[[dfs, tmp, t1]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t1]], selectionRoot=maprfs:/tmp/t1, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`*`]]]) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)