[jira] [Comment Edited] (DRILL-5822) The query with "SELECT *" with "ORDER BY" clause and `planner.slice_target`=1 doesn't preserve column order

2017-10-31 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227074#comment-16227074
 ] 

Paul Rogers edited comment on DRILL-5822 at 10/31/17 4:41 PM:
--

The general rule for the SQL project clause is the following:

* If the list is explicit, {{SELECT b, c, a}} then columns are returned in that 
order, even if the table defines them in the order (a, b, c).
* If the lis is implicit using a wildcard, {{SELECT *}}, then the column order 
is that defined by the table. In our example above, the order would be {{a, b, 
c}}.

Since Drill is distributed and schema-on-read, we run into the issue that two 
tables might have the same columns, but defined in different orders. For 
example:

{noformat}
Table 1: {"a": 10, "b": 20, "c": 30}
Table 2: {"c": 40, "b": 50, "a": 60}
{noformat}

In this case, there is no "correct" order. Instead, Drill must:

1. Recognize that the above scenario can occur.
2. Define each merging operator to follow some reconciliation rule.

Here a "merging" operator is anything that can see batches from two distinct 
scans. That is, almost all operators, but at least the receivers.

A good reconciliation rule is that the first schema wins, and all other batches 
are projected into that first schema. In our example, {{a, b, c}} and {{c, b, 
a}} are both projected into {{a, b, c}}.

The PMC has asked that we not discuss design issues in PR reviews on Github. 
So, can you perhaps please explain here the approach that this PR takes to 
solve the problem? Do we agree on the description above? Or, did this PR take a 
different approach?


was (Author: paul.rogers):
The general rule for the SQL project clause is the following:

* If the list is explicit, `SELECT b, c, a` then columns are returned in that 
order, even if the table defines them in the order (a, b, c).
* If the lis is implicit using a wildcard, `SELECT *`, then the column order is 
that defined by the table. In our example above, the order would be `a, b, c`.

Since Drill is distributed and schema-on-read, we run into the issue that two 
tables might have the same columns, but defined in different orders. For 
example, `{"a": 10, "b": 20, "c": 30}` and `{"c": 40, "b": 50, "c": 60}`. In 
this case, there is no "correct" order. Instead, Drill must:

1. Recognize that the above scenario can occur.
2. Define each merging operator to follow some reconciliation rule.

Here a "merging" operator is anything that can see batches from two distinct 
scans. That is, almost all operators, but at least the receivers.

A good reconciliation rule is that the first schema wins, and all other batches 
are projected into that first schema. In our example, `a, b, c` and `c, b, a` 
are both projected into `a, b, c`.

The PMC has asked that we not discuss design issues in PR reviews. So, can you 
perhaps please explain here the approach that this PR takes to solve the 
problem? Do we agree on the description above? Or, did this PR take a different 
approach?

> The query with "SELECT *" with "ORDER BY" clause and `planner.slice_target`=1 
> doesn't preserve column order
> ---
>
> Key: DRILL-5822
> URL: https://issues.apache.org/jira/browse/DRILL-5822
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Vitalii Diravka
> Fix For: 1.12.0
>
>
> Columns ordering doesn't preserve for the star query with sorting when this 
> is planned into multiple fragments.
> Repro steps:
> 1) {code}alter session set `planner.slice_target`=1;{code}
> 2) ORDER BY clause in the query.
> Scenarios:
> {code}
> 0: jdbc:drill:zk=local> alter session reset `planner.slice_target`;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.082 seconds)
> 0: jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` order by 
> n_name limit 1;
> +--+--+--+--+
> | n_nationkey  |  n_name  | n_regionkey  |  n_comment 
>   |
> +--+--+--+--+
> | 0| ALGERIA  | 0|  haggle. carefully final deposits 
> detect slyly agai  |
> +--+--+--+--+
> 1 row selected (0.141 seconds)
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary   

[jira] [Comment Edited] (DRILL-5822) The query with "SELECT *" with "ORDER BY" clause and `planner.slice_target`=1 doesn't preserve column order

2017-10-31 Thread Vitalii Diravka (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226679#comment-16226679
 ] 

Vitalii Diravka edited comment on DRILL-5822 at 10/31/17 1:37 PM:
--

This is an old topic which was discussed in DRILL-1499 and DRILL-3101. 
For now there is no need to canonicalize the batch or container since 
RecordBatchLoader swallows the "schema change" if two batches have different 
column ordering. That's why DRILL-847 is oudated.
PR for this ticket - https://github.com/apache/drill/pull/1017


was (Author: vitalii):
This is an old topic which was discussed in DRILL-1499 and DRILL-3101. 
For now there is no need to canonicalize the batch or container since 
RecordBatchLoader swallows the "schema change" if two batches have different 
column ordering. That's why DRILL-847 is oudated.

> The query with "SELECT *" with "ORDER BY" clause and `planner.slice_target`=1 
> doesn't preserve column order
> ---
>
> Key: DRILL-5822
> URL: https://issues.apache.org/jira/browse/DRILL-5822
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Prasad Nagaraj Subramanya
>Assignee: Vitalii Diravka
> Fix For: 1.12.0
>
>
> Columns ordering doesn't preserve for the star query with sorting when this 
> is planned into multiple fragments.
> Repro steps:
> 1) {code}alter session set `planner.slice_target`=1;{code}
> 2) ORDER BY clause in the query.
> Scenarios:
> {code}
> 0: jdbc:drill:zk=local> alter session reset `planner.slice_target`;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.082 seconds)
> 0: jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` order by 
> n_name limit 1;
> +--+--+--+--+
> | n_nationkey  |  n_name  | n_regionkey  |  n_comment 
>   |
> +--+--+--+--+
> | 0| ALGERIA  | 0|  haggle. carefully final deposits 
> detect slyly agai  |
> +--+--+--+--+
> 1 row selected (0.141 seconds)
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target`=1;
> +---++
> |  ok   |summary |
> +---++
> | true  | planner.slice_target updated.  |
> +---++
> 1 row selected (0.091 seconds)
> 0: jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` order by 
> n_name limit 1;
> +--+--+--+--+
> |  n_comment   |  n_name  | 
> n_nationkey  | n_regionkey  |
> +--+--+--+--+
> |  haggle. carefully final deposits detect slyly agai  | ALGERIA  | 0 
>| 0|
> +--+--+--+--+
> 1 row selected (0.201 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)