[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions
[ https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758951#comment-16758951 ] Igor Guzenko commented on DRILL-6856: - Fixed by Calcite update in [pull request|[https://github.com/apache/drill/pull/1631]|https://github.com/apache/drill/pull/1631].] (added test for the case). > Wrong result returned if the query filters a boolean column with both "is > true" and "is null" conditions > > > Key: DRILL-6856 > URL: https://issues.apache.org/jira/browse/DRILL-6856 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > Attachments: 0_0_0.parquet > > > *Data:* > A parquet file with a boolean column that contains null values. > An example is attached. > *Query:* > {code:sql} > select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col > is null > {code} > *Result:* > {noformat} > null > null > {noformat} > *Plan:* > {noformat} > 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, > cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 > memory}, id = 1980 > 00-01 Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1979 > 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1978 > 00-03 Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY > bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, > 0.0 network, 0.0 memory}, id = 1977 > 00-04Scan(table=[[dfs, tmp, Test_data]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : > rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 > rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976 > {noformat} > *Notes:* > - "true" values were not included in the result though they should have. > - Result is correct if use "bool_col = true" instead of "is true" > - In the plan you can see that "is true" condition is absent in the Filter > operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6856) Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions
[ https://issues.apache.org/jira/browse/DRILL-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16688296#comment-16688296 ] Volodymyr Vysotskyi commented on DRILL-6856: Looks like the problem is in Calcite and it was fixed after 1.17 release. [~IhorHuzenko] please check it after Calcite upgrade. > Wrong result returned if the query filters a boolean column with both "is > true" and "is null" conditions > > > Key: DRILL-6856 > URL: https://issues.apache.org/jira/browse/DRILL-6856 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: Anton Gozhiy >Assignee: Igor Guzenko >Priority: Major > Fix For: 1.16.0 > > Attachments: 0_0_0.parquet > > > *Data:* > A parquet file with a boolean column that contains null values. > An example is attached. > *Query:* > {code:sql} > select bool_col from dfs.tmp.`Test_data` where bool_col is true or bool_col > is null > {code} > *Result:* > {noformat} > null > null > {noformat} > *Plan:* > {noformat} > 00-00Screen : rowType = RecordType(ANY bool_col): rowcount = 3.75, > cumulative cost = {37.875 rows, 97.875 cpu, 15.0 io, 0.0 network, 0.0 > memory}, id = 1980 > 00-01 Project(bool_col=[$0]) : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {37.5 rows, 97.5 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1979 > 00-02SelectionVectorRemover : rowType = RecordType(ANY bool_col): > rowcount = 3.75, cumulative cost = {33.75 rows, 93.75 cpu, 15.0 io, 0.0 > network, 0.0 memory}, id = 1978 > 00-03 Filter(condition=[IS NULL($0)]) : rowType = RecordType(ANY > bool_col): rowcount = 3.75, cumulative cost = {30.0 rows, 90.0 cpu, 15.0 io, > 0.0 network, 0.0 memory}, id = 1977 > 00-04Scan(table=[[dfs, tmp, Test_data]], > groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///tmp/Test_data]], selectionRoot=maprfs:/tmp/Test_data, > numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`bool_col`]]]) : > rowType = RecordType(ANY bool_col): rowcount = 15.0, cumulative cost = {15.0 > rows, 15.0 cpu, 15.0 io, 0.0 network, 0.0 memory}, id = 1976 > {noformat} > *Notes:* > - "true" values were not included in the result though they should have. > - Result is correct if use "bool_col = true" instead of "is true" > - In the plan you can see that "is true" condition is absent in the Filter > operator -- This message was sent by Atlassian JIRA (v7.6.3#76005)