[jira] [Comment Edited] (HIVE-14027) NULL values produced by left outer join do not behave as NULL

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371107#comment-15371107
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14027 at 7/11/16 4:41 PM:
-

Consider the original query:

{noformat}
select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 1) 
a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
{noformat}

The field {{n}} from table {{a}} can become a constant {{1}} when we fold the 
OI. For every record that the left outer join produces, {{n}} will always be 
{{1}}. The real problem is that the _MapJoin_ reuses the OI of the key from the 
outer side for the key of the inner side. This is fine for inner joins, but it 
is incorrect for outer joins.

_Solution 1_ would be to rewrite the _MapJoin_ operator logic. But that change 
is non-trivial (at least for me that I am not familiar with the operator 
logic). After I studied it initially, I did not proceed with it, as I would 
have to change the initialization logic, process logic, UnwrapRowContainer, etc.

Thus, I went on with the _solution 2_ that is implemented in this patch. 
Consider an outer join. With this patch, OI for the outputs are created from 
outer and inner side independently from their inputs. But when there is a match 
in the join, the data for the key is always taken from the outer side (as it is 
done currently). Thus, we apply the _converter_ logic on the data to get the 
correct type.

In fact, depending on the cost of deserializing vs cost of conversion, it might 
be that the current solution is more efficient than _solution 1_.


was (Author: jcamachorodriguez):
Consider the original query:

{noformat}
select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 1) 
a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
{noformat}

The field {{n}} from table {{a}} can become a constant {{1}} when we fold the 
OI. For every record that the left outer join produces, {{n}} will always be 
{{1}}. The real problem is that the _MapJoin_ reuses the OI of the key from the 
outer side for the key of the inner side. This is fine for inner joins, but it 
is incorrect for outer joins.

_Solution 1_ would be to rewrite the _MapJoin_ operator logic. But that change 
is non-trivial (at least for me that I am not familiar with the operator 
logic). After I studied it initially, I did not proceed with it, as I would 
have to change the initialization logic, process logic, UnwrapRowContainer, etc.

Thus, I went on with the _solution 2_ that is implemented in this patch. 
Consider an outer join. In that case, OI for the outputs are created from outer 
and inner side independently from their inputs. But when there is a match in 
the join, the data for the key is always taken from the outer side (as it is 
done currently). Thus, we apply the _converter_ logic on the data to get the 
correct type.

In fact, depending on the cost of deserializing vs cost of conversion, it might 
be that the current solution is more efficient than _solution 1_.

> NULL values produced by left outer join do not behave as NULL
> -
>
> Key: HIVE-14027
> URL: https://issues.apache.org/jira/browse/HIVE-14027
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.2.0
>Reporter: Vaibhav Gumashta
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14027.01.patch, HIVE-14027.02.patch, 
> HIVE-14027.03.patch, HIVE-14027.patch
>
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 
> 1) a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1onefalsetrue
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into 
> tbl, and isnull returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14027) NULL values produced by left outer join do not behave as NULL

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371107#comment-15371107
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14027 at 7/11/16 4:40 PM:
-

Consider the original query:

{noformat}
select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 1) 
a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
{noformat}

The field {{n}} from table {{a}} can become a constant {{1}} when we fold the 
OI. For every record that the left outer join produces, {{n}} will always be 
{{1}}. The real problem is that the _MapJoin_ reuses the OI of the key from the 
outer side for the key of the inner side. This is fine for inner joins, but it 
is incorrect for outer joins.

_Solution 1_ would be to rewrite the _MapJoin_ operator logic. But that change 
is non-trivial (at least for me that I am not familiar with the operator 
logic). After I studied it initially, I did not proceed with it, as I would 
have to change the initialization logic, process logic, UnwrapRowContainer, etc.

Thus, I went on with the _solution 2_ that is implemented in this patch. 
Consider an outer join. In that case, OI for the outputs are created from outer 
and inner side independently from their inputs. But when there is a match in 
the join, the data for the key is always taken from the outer side (as it is 
done currently). Thus, we apply the _converter_ logic on the data to get the 
correct type.

In fact, depending on the cost of deserializing vs cost of conversion, it might 
be that the current solution is more efficient than _solution 1_.


was (Author: jcamachorodriguez):
Consider the original query:

{noformat}
select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 1) 
a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
{noformat}

The field {{n}} from table {{a}} can become a constant {{1}} when we fold the 
OI. For every record that the left outer join produces, {{n}} will always be 
{{1}}. The real problem is that the _MapJoin_ reuses the OI of the key from the 
outer side for the key of the inner side. This is fine for inner joins, but it 
is incorrect for outer joins.

_Solution 1_ would be to rewrite the _MapJoin_ operator logic. But that change 
is non-trivial (at least for me that I am not familiar with the operator 
logic). After I studied it initially, I did not proceed with it, as I would 
have to change the initialization logic, process logic, UnwrapRowContainer, etc.

Thus, I went on with the _solution 2_ that is implemented in this patch. 
Consider an outer join. If OI of outer/inner sides are different, we take the 
data from the outer side (as it is done currently), but we apply the 
_converter_ logic on the data.

In fact, depending on the cost of deserializing vs cost of conversion, it might 
be that the current solution is more efficient than _solution 1_.

> NULL values produced by left outer join do not behave as NULL
> -
>
> Key: HIVE-14027
> URL: https://issues.apache.org/jira/browse/HIVE-14027
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.2.0
>Reporter: Vaibhav Gumashta
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14027.01.patch, HIVE-14027.02.patch, 
> HIVE-14027.03.patch, HIVE-14027.patch
>
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 
> 1) a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1onefalsetrue
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into 
> tbl, and isnull returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14027) NULL values produced by left outer join do not behave as NULL

2016-07-11 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370984#comment-15370984
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14027 at 7/11/16 3:35 PM:
-

That is the source of the original problem and what the code was originally 
doing: the problem is that OI of the outer side becomes constant OI, and thus, 
the one of the inner side would return the constant too.


was (Author: jcamachorodriguez):
That is the source of the original problem and what the code was originally 
doing: the problem is that OI of the outer side becomes constant OI, and thus, 
the one of the right would return the constant too.

> NULL values produced by left outer join do not behave as NULL
> -
>
> Key: HIVE-14027
> URL: https://issues.apache.org/jira/browse/HIVE-14027
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.2.0
>Reporter: Vaibhav Gumashta
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14027.01.patch, HIVE-14027.02.patch, 
> HIVE-14027.03.patch, HIVE-14027.patch
>
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 
> 1) a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1onefalsetrue
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into 
> tbl, and isnull returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14027) NULL values produced by left outer join do not behave as NULL

2016-06-28 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353599#comment-15353599
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14027 at 6/28/16 7:42 PM:
-

Problem was in the initialization of MapJoin operator. We use the object 
inspector from the key of the BigTable in the MapJoin for the key of the 
broadcast table, but this is not correct for outer joins (if you have not match 
for a left record, the value for the key column in the right side will be 
null). I have uploaded a fix and the test case to validate the fix.


was (Author: jcamachorodriguez):
Problem was in the initialization of operator. We use the object inspector from 
the key of the BigTable in the MapJoin for the key of the broadcast table, but 
this is not correct for outer joins (if you have not match for a left record, 
the value for the key column in the right side will be null). I have uploaded a 
fix and the test case to validate the fix.

> NULL values produced by left outer join do not behave as NULL
> -
>
> Key: HIVE-14027
> URL: https://issues.apache.org/jira/browse/HIVE-14027
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1, 2.2.0
>Reporter: Vaibhav Gumashta
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14027.patch
>
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 
> 1) a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1onefalsetrue
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into 
> tbl, and isnull returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14027) NULL values produced by left outer join do not behave as NULL

2016-06-26 Thread balaswamy vaddeman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347646#comment-15347646
 ] 

balaswamy vaddeman edited comment on HIVE-14027 at 6/27/16 6:43 AM:


similar issue.https://issues.apache.org/jira/browse/HIVE-13977


was (Author: vaddebalu):
similar issue.https://issues.apache.org/jira/browse/HIVE-14027

> NULL values produced by left outer join do not behave as NULL
> -
>
> Key: HIVE-14027
> URL: https://issues.apache.org/jira/browse/HIVE-14027
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 
> 1) a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1onefalsetrue
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into 
> tbl, and isnull returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14027) NULL values produced by left outer join do not behave as NULL

2016-06-24 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348967#comment-15348967
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-14027 at 6/25/16 12:56 AM:
--

It seems to be a problem in the execution side, not in the planner.

If we set hive.auto.convert.join to false, we execute a MergeJoin and we get 
correct results.



was (Author: jcamachorodriguez):
It seems to be a problem in the execution side, not in the planner.

If we set hive.auto.convert.join to false, we get correct results.


> NULL values produced by left outer join do not behave as NULL
> -
>
> Key: HIVE-14027
> URL: https://issues.apache.org/jira/browse/HIVE-14027
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> Consider the following setup:
> {code}
> create table tbl (n bigint, t string); 
> insert into tbl values (1, 'one'); 
> insert into tbl values(2, 'two');
> select a.n, a.t, isnull(b.n), isnull(b.t) from (select * from tbl where n = 
> 1) a  left outer join  (select * from tbl where 1 = 2) b on a.n = b.n;
> 1onefalsetrue
> {code}
> The query should return true for isnull(b.n).
> I've tested by inserting a row with null value for the bigint column into 
> tbl, and isnull returns true in that case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)