[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-08-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652125#comment-14652125
 ] 

Matt McCline commented on HIVE-11410:
-

No problem -- thank you for your response.

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Assignee: Matt McCline
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-08-03 Thread Nicholas Brenwald (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652089#comment-14652089
 ] 

Nicholas Brenwald commented on HIVE-11410:
--

[~mmccline] I have done some further testing today compiling from source 
various branches. 

The issue only seems to be present in release-1.1.0 (which is part of the 
Cloudera distribution we use). The issue cannot be reproduced in branch-1.1 or 
branch-1.2 (even when using our environment variables/hive-site.xml etc). As 
such I think this can be marked as resolved. 
Thanks for looking into this and sorry for the false alarm.

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Assignee: Matt McCline
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-31 Thread Nicholas Brenwald (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649184#comment-14649184
 ] 

Nicholas Brenwald commented on HIVE-11410:
--

Hi, 
Thanks for taking a look at this so quickly.
I confirm we are using branch-1.1 (distributed as part of CDH 5.4.4). For 
example, hive cli jar is named hive-cli-1.1.0-cdh5.4.4.jar. When we run 'hive' 
on the command line, we see the following printed message showing the 
hive-common-1.1.0 is being used.
{code}
Logging initialized using configuration in 
jar:file:/cloudera/parcel-repo/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/hive-common-1.1.0-cdh5.4.4.jar!/hive-log4j.properties
{code}

And the explain plan we see is as follows:
{code}
hive EXPLAIN
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-5 depends on stages: Stage-1
  Stage-4 depends on stages: Stage-5
  Stage-0 depends on stages: Stage-4

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: t2
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: c1 (type: string), c2 (type: int)
  outputColumnNames: c1, c2
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
  Group By Operator
aggregations: max(c2)
keys: c1 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: string)
  sort order: +
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col1 (type: int)
  Reduce Operator Tree:
Group By Operator
  aggregations: max(VALUE._col0)
  keys: KEY._col0 (type: string)
  mode: mergepartial
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
NONE
  Filter Operator
predicate: _col1 is not null (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
File Output Operator
  compressed: false
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-5
Map Reduce Local Work
  Alias - Map Local Tables:
t1 
  Fetch Operator
limit: -1
  Alias - Map Local Operator Tree:
t1 
  TableScan
alias: t1
filterExpr: c2 is not null (type: boolean)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: c2 is not null (type: boolean)
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
  HashTable Sink Operator
keys:
  0 c2 (type: int)
  1 _col1 (type: int)

  Stage: Stage-4
Map Reduce
  Map Operator Tree:
  TableScan
Map Join Operator
  condition map:
   Inner Join 0 to 1
  keys:
0 c2 (type: int)
1 _col1 (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: true
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  Local Work:
Map Reduce Local Work

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{code}

 Join with subquery containing a group by incorrectly returns no results
 

[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-31 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648855#comment-14648855
 ] 

Matt McCline commented on HIVE-11410:
-

[~nbrenwald] please attach your EXPLAIN plan for the query.  And, confirm you 
are using branch-1.1

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Assignee: Matt McCline
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-31 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648853#comment-14648853
 ] 

Matt McCline commented on HIVE-11410:
-


Right, postgres produces:
{code}
mmccline=# SELECT
  t1.c1
FROM 
  t t1
JOIN
(SELECT 
   t2.c1,
   MAX(t2.c2) AS c2
 FROM 
   t t2 
 GROUP BY 
   t2.c1
) t3
ON t1.c2=t3.c2;
 c1 
-
 abc
(1 row)
{code}

And, Hive branch-1.1 produces the right result:
{code}
SELECT 
  t1.c1
FROM 
  t t1
JOIN
(SELECT 
   t2.c1,
   MAX(t2.c2) AS c2
 FROM 
   t t2 
 GROUP BY 
   t2.c1
) t3
ON t1.c2=t3.c2;
abc
{code}

Here is the EXPLAIN plan:
{code}
EXPLAIN
SELECT 
  t1.c1
FROM 
  t t1
JOIN
(SELECT 
   t2.c1,
   MAX(t2.c2) AS c2
 FROM 
   t t2 
 GROUP BY 
   t2.c1
) t3
ON t1.c2=t3.c2;
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: t2
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: c1 (type: string), c2 (type: int)
  outputColumnNames: c1, c2
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
  Group By Operator
aggregations: max(c2)
keys: c1 (type: string)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: string)
  sort order: +
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col1 (type: int)
  Reduce Operator Tree:
Group By Operator
  aggregations: max(VALUE._col0)
  keys: KEY._col0 (type: string)
  mode: mergepartial
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
NONE
  Select Operator
expressions: _col1 (type: int)
outputColumnNames: _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
Filter Operator
  predicate: _col1 is not null (type: boolean)
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
  Select Operator
expressions: _col1 (type: int)
outputColumnNames: _col1
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
File Output Operator
  compressed: false
  table:
  input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
Map Reduce
  Map Operator Tree:
  TableScan
Reduce Output Operator
  key expressions: _col1 (type: int)
  sort order: +
  Map-reduce partition columns: _col1 (type: int)
  Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
stats: NONE
  TableScan
alias: t1
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
Filter Operator
  predicate: c2 is not null (type: boolean)
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
  Reduce Output Operator
key expressions: c2 (type: int)
sort order: +
Map-reduce partition columns: c2 (type: int)
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE 
Column stats: NONE
value expressions: c1 (type: string)
  Reduce Operator Tree:
Join Operator
  condition map:
   Inner Join 0 to 1
  keys:
0 c2 (type: int)
1 _col1 (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 5 Basic stats: COMPLETE Column 
stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor 

[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-31 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648854#comment-14648854
 ] 

Matt McCline commented on HIVE-11410:
-

By the way, thank you for the create repro description.

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Assignee: Matt McCline
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11410) Join with subquery containing a group by incorrectly returns no results

2015-07-30 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648523#comment-14648523
 ] 

Mostafa Mokhtar commented on HIVE-11410:


[~mmccline]

 Join with subquery containing a group by incorrectly returns no results
 ---

 Key: HIVE-11410
 URL: https://issues.apache.org/jira/browse/HIVE-11410
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.1.0
Reporter: Nicholas Brenwald
Priority: Minor
 Attachments: hive-site.xml


 Start by creating a table *t* with columns *c1* and *c2* and populate with 1 
 row of data. For example create table *t* from an existing table which 
 contains at least 1 row of data by running:
 {code}
 create table t as select 'abc' as c1, 0 as c2 from Y limit 1; 
 {code}
 Table *t* looks like the following:
 ||c1||c2||
 |abc|0|
 Running the following query then returns zero results.
 {code}
 SELECT 
   t1.c1
 FROM 
   t t1
 JOIN
 (SELECT 
t2.c1,
MAX(t2.c2) AS c2
  FROM 
t t2 
  GROUP BY 
t2.c1
 ) t3
 ON t1.c2=t3.c2
 {code}
 However, we expected to see the following:
 ||c1||
 |abc|
 The problem seems to relate to the fact that in the subquery, we group by 
 column *c1*, but this is not subsequently used in the join condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)