subject:"\[jira\] \[Commented\] \(DRILL\-6856\) Wrong result returned if the query filters a boolean column with both \"is true\" and \"is null\" conditions"

[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

2019-02-02 Thread Igor Guzenko (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758951#comment-16758951
 ] 

Igor Guzenko commented on DRILL-6856:
-

Fixed by Calcite update in [pull 
request|[https://github.com/apache/drill/pull/1631]|https://github.com/apache/drill/pull/1631].]
 (added test for the case).  

> Wrong result returned if the query filters a boolean column with both "is 
> true" and "is null" conditions
> 
>
> Key: DRILL-6856
> URL: https://issues.apache.org/jira/browse/DRILL-6856
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 0_0_0.parquet
>
>
> *Data:*
> A parquet file with a boolean column that contains null values.
> An example is attached.
> *Query:*
> {code:sql}
> select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col 
> is null
> {code}
> *Result:*
> {noformat}
> null
> null
> {noformat}
> *Plan:*
> {noformat}
> 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, 
> cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 
> memory}, id = 1980
> 00-01  Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1979
> 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1978
> 00-03  Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY 
> bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, 
> 0.0 network, 0.0 memory}, id = 1977
> 00-04Scan(table=[[dfs, tmp, Test_data]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : 
> rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 
> rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976
> {noformat}
> *Notes:* 
> - "true" values were not included in the result though they should have.
> - Result is correct if use "bool_col = true" instead of "is true"
> - In the plan you can see that "is true" condition is absent in the Filter 
> operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

2018-11-15 Thread Volodymyr Vysotskyi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688296#comment-16688296
 ] 

Volodymyr Vysotskyi commented on DRILL-6856:


Looks like the problem is in Calcite and it was fixed after 1.17 release.
[~IhorHuzenko] please check it after Calcite upgrade.

> Wrong result returned if the query filters a boolean column with both "is 
> true" and "is null" conditions
> 
>
> Key: DRILL-6856
> URL: https://issues.apache.org/jira/browse/DRILL-6856
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 0_0_0.parquet
>
>
> *Data:*
> A parquet file with a boolean column that contains null values.
> An example is attached.
> *Query:*
> {code:sql}
> select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col 
> is null
> {code}
> *Result:*
> {noformat}
> null
> null
> {noformat}
> *Plan:*
> {noformat}
> 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, 
> cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 
> memory}, id = 1980
> 00-01  Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1979
> 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): 
> rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 
> network, 0.0 memory}, id = 1978
> 00-03  Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY 
> bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, 
> 0.0 network, 0.0 memory}, id = 1977
> 00-04Scan(table=[[dfs, tmp, Test_data]], 
> groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : 
> rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 
> rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976
> {noformat}
> *Notes:* 
> - "true" values were not included in the result though they should have.
> - Result is correct if use "bool_col = true" instead of "is true"
> - In the plan you can see that "is true" condition is absent in the Filter 
> operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions

2 matches

Site Navigation

Mail list logo

Footer information