[jira] [Updated] (HIVE-7346) Wrong results caused by hive ppd under specific join condition
[ https://issues.apache.org/jira/browse/HIVE-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7346: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Wrong results caused by hive ppd under specific join condition -- Key: HIVE-7346 URL: https://issues.apache.org/jira/browse/HIVE-7346 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: dima machlin Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7346.1.patch.txt Assuming two tables : {code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code} t1 contains 1 row : 'a','a' t2 contains 1 row : 'a',2 The following query : {code:sql} select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d =1 and c.d=1 {code} Returns 0 rows as expected because t2.d = 2 Wrapping this query, like so : {code:sql} select * from ( select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d =1 and c.d=1 ) z where d11 or d21 {code} Where another filter was add on the columns causes the plan to lack the filter of the =1 and return a single row - *Wrong Results*. The plan is : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL b) d) 1) (= (. (TOK_TABLE_OR_COL c) d) 1) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or ( (TOK_TABLE_OR_COL d1) 1) ( (TOK_TABLE_OR_COL d2) 1) STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: z:b Fetch Operator limit: -1 z:c Fetch Operator limit: -1 Alias - Map Local Operator Tree: z:b TableScan alias: b HashTable Sink Operator condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] Position of Big Table: 0 z:c TableScan alias: c HashTable Sink Operator condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 0 Stage: Stage-5 Map Reduce Alias - Map Operator Tree: z:a TableScan alias: a Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 0 Filter Operator predicate: expr: (_col1 = _col4) type: boolean Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col1, _col4, _col5, _col9 Position of Big Table: 0 Filter Operator predicate: expr: ((_col1 1) or (_col9 1)) type: boolean Select Operator expressions: expr: _col4 type: string expr: _col5 type: string expr: _col1
[jira] [Updated] (HIVE-7346) Wrong results caused by hive ppd under specific join condition
[ https://issues.apache.org/jira/browse/HIVE-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7346: Attachment: HIVE-7346.1.patch.txt Wrong results caused by hive ppd under specific join condition -- Key: HIVE-7346 URL: https://issues.apache.org/jira/browse/HIVE-7346 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: dima machlin Assignee: Navis Attachments: HIVE-7346.1.patch.txt Assuming two tables : {code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code} t1 contains 1 row : 'a','a' t2 contains 1 row : 'a',2 The following query : {code:sql} select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d =1 and c.d=1 {code} Returns 0 rows as expected because t2.d = 2 Wrapping this query, like so : {code:sql} select * from ( select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d =1 and c.d=1 ) z where d11 or d21 {code} Where another filter was add on the columns causes the plan to lack the filter of the =1 and return a single row - *Wrong Results*. The plan is : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL b) d) 1) (= (. (TOK_TABLE_OR_COL c) d) 1) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or ( (TOK_TABLE_OR_COL d1) 1) ( (TOK_TABLE_OR_COL d2) 1) STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: z:b Fetch Operator limit: -1 z:c Fetch Operator limit: -1 Alias - Map Local Operator Tree: z:b TableScan alias: b HashTable Sink Operator condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] Position of Big Table: 0 z:c TableScan alias: c HashTable Sink Operator condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 0 Stage: Stage-5 Map Reduce Alias - Map Operator Tree: z:a TableScan alias: a Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 0 Filter Operator predicate: expr: (_col1 = _col4) type: boolean Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col1, _col4, _col5, _col9 Position of Big Table: 0 Filter Operator predicate: expr: ((_col1 1) or (_col9 1)) type: boolean Select Operator expressions: expr: _col4 type: string expr: _col5 type: string expr: _col1 type: int expr: _col9 type: int outputColumnNames: _col0, _col1, _col2, _col3
[jira] [Updated] (HIVE-7346) Wrong results caused by hive ppd under specific join condition
[ https://issues.apache.org/jira/browse/HIVE-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7346: Affects Version/s: 0.13.0 0.13.1 Status: Patch Available (was: Open) Wrong results caused by hive ppd under specific join condition -- Key: HIVE-7346 URL: https://issues.apache.org/jira/browse/HIVE-7346 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: dima machlin Assignee: Navis Attachments: HIVE-7346.1.patch.txt Assuming two tables : {code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code} t1 contains 1 row : 'a','a' t2 contains 1 row : 'a',2 The following query : {code:sql} select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d =1 and c.d=1 {code} Returns 0 rows as expected because t2.d = 2 Wrapping this query, like so : {code:sql} select * from ( select a.*,b.d d1,c.d d2 from t1 a join t2 b on (a.id1=b.id) join t2 c on (a.id2=b.id) where b.d =1 and c.d=1 ) z where d11 or d21 {code} Where another filter was add on the columns causes the plan to lack the filter of the =1 and return a single row - *Wrong Results*. The plan is : {code:sql} ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (= (. (TOK_TABLE_OR_COL b) d) 1) (= (. (TOK_TABLE_OR_COL c) d) 1) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or ( (TOK_TABLE_OR_COL d1) 1) ( (TOK_TABLE_OR_COL d2) 1) STAGE DEPENDENCIES: Stage-7 is a root stage Stage-5 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-7 Map Reduce Local Work Alias - Map Local Tables: z:b Fetch Operator limit: -1 z:c Fetch Operator limit: -1 Alias - Map Local Operator Tree: z:b TableScan alias: b HashTable Sink Operator condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] Position of Big Table: 0 z:c TableScan alias: c HashTable Sink Operator condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] Position of Big Table: 0 Stage: Stage-5 Map Reduce Alias - Map Operator Tree: z:a TableScan alias: a Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {id1} {id2} 1 {id} {d} handleSkewJoin: false keys: 0 [Column[id1]] 1 [Column[id]] outputColumnNames: _col0, _col1, _col4, _col5 Position of Big Table: 0 Filter Operator predicate: expr: (_col1 = _col4) type: boolean Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col5} {_col0} {_col1} 1 {d} handleSkewJoin: false keys: 0 [] 1 [] outputColumnNames: _col1, _col4, _col5, _col9 Position of Big Table: 0 Filter Operator predicate: expr: ((_col1 1) or (_col9 1)) type: boolean Select Operator expressions: expr: _col4 type: string expr: _col5 type: string expr: _col1 type: int expr: _col9