[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-2488:
--
Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch

It turns out to be an issue with supported encoding for MergeJoin.  Merge Join 
execution operator currently does not process incoming batches with SV2 or SV4, 
so if there was a Limit or Sort below, we need to insert a 
SelectionVectorRemover below the MJ.  
Uploaded a patch with a simple fix.  [~vkorukanti] could you please review ?  I 
haven't run all tests yet..still in process.

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Aman Sinha
Priority: Critical
 Fix For: 0.9.0

 Attachments: 
 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  

[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-2488:

Attachment: t1.parquet

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
 Fix For: 0.9.0

 Attachments: t1.parquet


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 {code}
 If you turn off 

[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Chris Westin (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Westin updated DRILL-2488:

Fix Version/s: 0.9.0

 Wrong result on join between two subqueries with aggregation
 

 Key: DRILL-2488
 URL: https://issues.apache.org/jira/browse/DRILL-2488
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
 Fix For: 0.9.0


 {code}
 0: jdbc:drill:schema=dfs select * from t1;
 ++++
 | a1 | b1 | c1 |
 ++++
 | 1  | a  | 2015-01-01 |
 | 2  | b  | 2015-01-02 |
 | 3  | c  | 2015-01-03 |
 | 4  | null   | 2015-01-04 |
 | 5  | e  | 2015-01-05 |
 | 6  | f  | 2015-01-06 |
 | 7  | g  | 2015-01-07 |
 | null   | h  | 2015-01-08 |
 | 9  | i  | null   |
 | 10 | j  | 2015-01-10 |
 ++++
 10 rows selected (0.15 seconds)
 {code}
 This result is incorrect, one row is missing
 {code}
 0: jdbc:drill:schema=dfs select * from
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq1(x1, y1)
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  inner join
 . . . . . . . . . . . . 
 . . . . . . . . . . . .  (
 . . . . . . . . . . . .  select
 . . . . . . . . . . . .  b1,
 . . . . . . . . . . . .  count(distinct a1)
 . . . . . . . . . . . .  from
 . . . . . . . . . . . .  t1
 . . . . . . . . . . . .  group by
 . . . . . . . . . . . .  b1
 . . . . . . . . . . . .  order by
 . . . . . . . . . . . .  b1 limit 5 offset 1
 . . . . . . . . . . . .  ) as sq2(x1, y1)
 . . . . . . . . . . . .  on
 . . . . . . . . . . . .  sq1.x1 = sq2.x1 and
 . . . . . . . . . . . .  sq2.y1 = sq2.y1
 . . . . . . . . . . . .  ;
 +++++
 | x1 | y1 |x10 |y10 |
 +++++
 | b  | 1  | b  | 1  |
 | c  | 1  | c  | 1  |
 | e  | 1  | e  | 1  |
 | f  | 1  | f  | 1  |
 +++++
 4 rows selected (0.28 seconds)
 {code}
 Explain plan for the wrong result:
 {code}
 00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
 00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
 00-05Limit(offset=[1], fetch=[5])
 00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-09Sort(sort0=[$0], dir0=[ASC])
 00-11  StreamAgg(group=[{0, 1}])
 00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
 00-15  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 00-04Project(b10=[$0], EXPR$10=[$1])
 00-06  SelectionVectorRemover
 00-08Sort(sort0=[$0], dir0=[ASC])
 00-10  Filter(condition=[=($1, $1)])
 00-12Limit(offset=[1], fetch=[5])
 00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
 00-16Sort(sort0=[$0], dir0=[ASC])
 00-17  StreamAgg(group=[{0, 1}])
 00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
 dir1=[ASC])
 00-19  Scan(groupscan=[ParquetGroupScan 
 [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
 selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
 `a1`]]])
 {code}
 If you turn off merge join, query returns correct result:
 

[jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation

2015-03-17 Thread Victoria Markman (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-2488:

Description: 
{code}
0: jdbc:drill:schema=dfs select * from t1;
++++
| a1 | b1 | c1 |
++++
| 1  | a  | 2015-01-01 |
| 2  | b  | 2015-01-02 |
| 3  | c  | 2015-01-03 |
| 4  | null   | 2015-01-04 |
| 5  | e  | 2015-01-05 |
| 6  | f  | 2015-01-06 |
| 7  | g  | 2015-01-07 |
| null   | h  | 2015-01-08 |
| 9  | i  | null   |
| 10 | j  | 2015-01-10 |
++++
10 rows selected (0.15 seconds)
{code}

This result is incorrect, one row is missing
{code}
0: jdbc:drill:schema=dfs select * from
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .  ) as sq1(x1, y1)
. . . . . . . . . . . . 
. . . . . . . . . . . .  inner join
. . . . . . . . . . . . 
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .  ) as sq2(x1, y1)
. . . . . . . . . . . .  on
. . . . . . . . . . . .  sq1.x1 = sq2.x1 and
. . . . . . . . . . . .  sq2.y1 = sq2.y1
. . . . . . . . . . . .  ;
+++++
| x1 | y1 |x10 |y10 |
+++++
| b  | 1  | b  | 1  |
| c  | 1  | c  | 1  |
| e  | 1  | e  | 1  |
| f  | 1  | f  | 1  |
+++++
4 rows selected (0.28 seconds)
{code}

Explain plan for the wrong result:
{code}
00-01  Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-02Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-03  MergeJoin(condition=[=($0, $2)], joinType=[inner])
00-05Limit(offset=[1], fetch=[5])
00-07  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-09Sort(sort0=[$0], dir0=[ASC])
00-11  StreamAgg(group=[{0, 1}])
00-13Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-15  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
`a1`]]])
00-04Project(b10=[$0], EXPR$10=[$1])
00-06  SelectionVectorRemover
00-08Sort(sort0=[$0], dir0=[ASC])
00-10  Filter(condition=[=($1, $1)])
00-12Limit(offset=[1], fetch=[5])
00-14  StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-16Sort(sort0=[$0], dir0=[ASC])
00-17  StreamAgg(group=[{0, 1}])
00-18Sort(sort0=[$0], sort1=[$1], dir0=[ASC], 
dir1=[ASC])
00-19  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], 
selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, 
`a1`]]])
{code}

If you turn off merge join, query returns correct result:
{code}
0: jdbc:drill:schema=dfs select * from
. . . . . . . . . . . .  (
. . . . . . . . . . . .  select
. . . . . . . . . . . .  b1,
. . . . . . . . . . . .  count(distinct a1)
. . . . . . . . . . . .  from
. . . . . . . . . . . .  t1
. . . . . . . . . . . .  group by
. . . . . . . . . . . .  b1
. . . . . . . . . . . .  order by
. . . . . . . . . . . .  b1 limit 5 offset 1
. . . . . . . . . . . .   ) as sq1(x1, y1)
. . . . . . . . . . . . 
. . . . . . . . . . . .  inner join
. . . . . . . . . . . .  (
. . . . . . . . . . . .