Re: [QUESTION] Does SQL standard allows project nested fields for aggregate ?

2020-10-20 Thread Jinfeng Ni
SELECT f0.a, max(f1) FROM t1 GROUP BY f0

I think the problem with the 1st query is that the parser is confused
with 'f0' being a table or column name.

PostgrelSQL doc [1] suggests to use () to denote column, and the part
after () to denote the subfield. In your case, the query should be:

SELECT (f0).a, max(f1) FROM t1 GROUP BY f0;

1. https://www.postgresql.org/docs/current/rowtypes.html

On Tue, Oct 20, 2020 at 12:17 PM Julian Hyde  wrote:
>
> I think that your first query,
>
>   SELECT f0.a, max(f1) FROM t1 GROUP BY f0
>
> should be valid. I don't know whether the SQL standard thinks it
> should be valid, or whether Calcite can handle it. And I don't know
> why PostgreSQL has a problem with it.
>
> Julian
>
> On Mon, Oct 19, 2020 at 9:41 PM Danny Chan  wrote:
> >
> > CREATE TYPE my_type AS ( a int , b VARCHAR(20));
> >
> > create table t1(
> >   f0 my_type,
> >   f1 int,
> >   f2 varchar(20)
> > );
> >
> > insert into t1 values((1, 'abc'), 2, ‘def’);
> >
> > SELECT f0.a, max(f1) FROM t1 GROUP BY f0; — this is invalid in PostgreSQL
> >
> > SELECT f0, max(f1) FROM t1 GROUP BY f0; — this is a valid query
> >
> >
> > My question is does SQL standard allows projecting nested fields for 
> > aggregate ? In current Calcite, it throws and complains that the nested 
> > field can not be seen in the scope (somehow same with the PG).
> >
> > Best,
> > Danny Chan


Re: [ANNOUNCE] Haisheng Yuan joins Calcite PMC

2019-11-11 Thread Jinfeng Ni
Congratulations!


On Tue, Nov 12, 2019 at 1:23 AM Rui Wang  wrote:
>
> Congrats HaiSheng!
>
>
> -Rui
>
> On Mon, Nov 11, 2019 at 8:05 AM Stamatis Zampetakis 
> wrote:
>
> > Congrats Haisheng!
> >
> > Reviews, code contributions, design discussions, helping users, and many
> > more things for improving the project.
> >
> > Personally, I also learn a lot from our interactions.
> >
> > All these are much appreciated; keep it up!!
> >
> > Best,
> > Stamatis
> >
> > On Mon, Nov 11, 2019, 4:17 PM Michael Mior  wrote:
> >
> > > Welcome and congratulations HaiSheng!
> > > --
> > > Michael Mior
> > > mm...@apache.org
> > >
> > > Le dim. 10 nov. 2019 à 22:45, Francis Chuang
> > >  a écrit :
> > > >
> > > > I'm pleased to announce that Haisheng has accepted an invitation to
> > > > join the Calcite PMC. Haisheng has been a consistent and helpful
> > > > figure in the Calcite community for which we are very grateful. We
> > > > look forward to the continued contributions and support.
> > > >
> > > > Please join me in congratulating Haisheng!
> > > >
> > > > - Francis (on behalf of the Calcite PMC)
> > >
> >


Re: Re: [DISCUSS] On-demand traitset request

2019-11-05 Thread Jinfeng Ni
@Haisheng, @Xiening,

Thanks for pointing that previous email out.  Overall, I agree that
the physical trait enforcement should be done in the engine, not in
the rule. For the rule, it should only specify the request, and the
corresponding transformation, and let the engine to explore the search
space. It will be great if we can revamp the Volcano optimizer
framework, to do that way.

In terms of search space, it's always a tradeoff between the space
searched and the optimality of the plan found. I think it's fine for
the engine to explore a potential big search space, as long as it has
effective "bound-and-prune" strategy. In the original Volcano paper,
there is a way to prune the search space based on the best plan found
so far, using the parameter "limit".  When an implementable plan is
found, a "real" cost is obtained, which could be used to prune
un-necessary search space.  That's actually the advantage of Volcano's
"top-down" approach. However,  seems to me that Calcite's Volcano did
not apply that approach effectively, because of the existence of
AbstractConverter.


On Sun, Nov 3, 2019 at 10:12 PM Haisheng Yuan  wrote:
>
> Hi Jinfeng,
>
> I think you might have missed the email about proposed API for physical 
> operators I sent out previously in [1].
>
> We don't need request all the permutation, which is also impossible in 
> practice, the search space is going to explode.
>
> In the example in email [1], I already talked about your concen on passing 
> down parent request into children may lead to less optimal plan. Besically 
> join operator can send 2 collation optimization requests, one is to pass 
> request down, the other one is ignore the parent's request.
>
> Using AbstractConverter to enforce properties is inapporpriate, which handles 
> all the optimization work to physical operator providers, meaning there is 
> almost no physical level optimization mechanism in Calcite. SQL Server and 
> Greenplum's optimizer, which are Cascades framework based, implemented the 
> property enforcement in the core optimizer engine, not through 
> AbstractConverter and rules, physical operators just need to implement those 
> methods (or similar) I mentioned in email [1]. My goal is completely 
> abolishing AbstractConverter.
>
> [1] 
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e
>
> - Haisheng
>
> --
> 发件人:Jinfeng Ni
> 日 期:2019年11月01日 14:10:30
> 收件人:
> 主 题:Re: [DISCUSS] On-demand traitset request
>
> Hi Xiening,
>
> "Let say if R and S doesn’t have sorting properties at all. In your
> case, we would end up adding enforcers for LHS and RHS to get
> collation (a, b, c). Then we would need another enforcer to get
> collation (b, c). This is a sub optimal plan as we could have use (b,
> c, a) for join."
>
> In such case, for step 2 when MergeJoin request a permutation match of
> (a, b,c) on both it's input, it is not necessary to end up with
> collation (a, b, c) only. Since it request "permutation", MJ could ask
> all possible satisfying collations, which include (b, c, a). In other
> words, the steps I described did not exclude such plan.
>
> You may argue it would increase the search space. However, by
> limiting the search space, without explore all possible choice, we may
> lose the chance getting 'optimal' plan we want. For instance, in the
> above example, the idea of passing "on demand" trait request (b,c)
> from Agg to MJ is to avoid unnecessary sort (b,c). In cases where the
> join condition has good filtering, and such sort of join output could
> be quite cheap. Yet in the plan enumeration, since we use "on demand"
> trait request from parent to guide the actions of MJ, I'm not sure if
> we may restrict the choices we consider in the legs of join, whose
> cardinality could be larger and play a bigger role in the overall
> cost.
>
> In other words, by using "on demand" trait request, we may restrict
> the choices of plan, possibly in the some operators with larger data
> size.
>
> In the current implementation of VolcanoPlanner, I feel the root issue
> of long planning time is not to explore all possible satisfying trait.
> It is actually the unnecessary of AbstractConverter, added to the
> equivalence class.
>
>
> On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai  wrote:
> >
> > Thanks for the sharing. I like the way you model this problem, Jinfeng.
> >
> > There’s one minor issue with your example. Let say if R and S doesn’t have 
> > sorting properties at all. In your case, we would end up adding enforcers 
> > for L

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-11-01 Thread Jinfeng Ni
I think "pull-up traits" is necessary, since the physical traits are
propagated  upwards. However, I'm not fully convinced "On Demand
Trait" is the best solution, as I feel it may restrict the choices the
planner may consider.  Maybe after the proposed design doc is ready,
we may look into the detail and reevaluate.

One question that I have always had ever since I started using Calcite
couple of years ago,  is the concept of Convention being a trait, and
introduction of AbstractConverter in Calcite's VolcanoPlanner.
Quickly looking through the original paper of Volcano [1], or the new
one [2], I did not see mentioning of such concepts.  The original
paper has a classification of "transformation rules" which operates on
logical relation expression  and "implementation rules" which provides
the mapping to physical operators.


1. https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
2. 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/volcano-prasan.pdf

On Thu, Oct 31, 2019 at 2:40 PM Xiening Dai  wrote:
>
> Actually we solved this problem in our setup using a mechanism called 
> “Pull-Up Traits”, which explores the possible trait set of children’s input 
> to decide parent’s physical properties. In order to determine child input 
> trait, you would have to look at child’s children, and all the way to the 
> leaves nodes or a barrier. A barrier is a rel node which cannot derive any 
> traits regardless the input. A good example would be a user define function 
> which would throw off any distribution or collation. Then we realize just 
> pulling up is not enough, sometimes we would need to look at parent’s 
> requirement as well. So we try to solve this in a unified framework, which we 
> call “On Demand Trait” and implement it as part of the framework so anyone 
> can be benefited. I hope Haisheng can share a design doc once we have more 
> concrete ideas.
>
>
> > On Oct 31, 2019, at 11:37 AM, Jinfeng Ni  wrote:
> >
> > Hi Vladimir,
> >
> > The SubsetTransformer interface and the iterating over the RelNodes
> > within a RelSubset in Drill  is exactly implemented to do the trait
> > propagation. We also had to rely on AbstractConverter to fire
> > necessary rule to avoid the CanNotPlan issue. At some point, Calcite
> > community chooses to remove AbstractConverter and Drill had to add it
> > back, which is probably one of the main reasons for us to continue
> > using a Calcite fork.  I still remember we constantly had to deal with
> > the dilemma between "CanNotPlan" and long planing time due to explored
> > search space.
> >
> > Glad to see more people are joining the effort to solve this long
> > overdue issue, something missing in Calcite's core optimizer framework
> > "since before Calcite was Calcite" (Jacques's words).
> >
> > Jinfeng
> >
> >
> > On Thu, Oct 31, 2019 at 3:38 AM Vladimir Ozerov  wrote:
> >>
> >> Hi Danny,
> >>
> >> Thank you very much for the links. What is described here is pretty much
> >> similar to the problem I describe. Especially the discussion about trait
> >> propagation, as this is basically what I need - to explore potential traits
> >> of children before optimizing parents. And this is basically what Drill
> >> already does with it's SubsetTransformer:
> >> 1) There is a SubsetTransformer interface, which iterates over physical
> >> relations of the given subset [1]
> >> 2) If you want to make a physical project, you iterate over physical
> >> relations of the input subset and create possible physical projects [2]
> >> 3) But if you cannot find any physical input, then you trigger creation of
> >> a "bad" physical project, which is very likely to have poor cost because it
> >> cannot take advantage of input's distribution information [3]
> >> So, step 2 - is a trait set propagation which is needed by many
> >> distributed engines. Step 3 - an attempt to workaround current
> >> VolcanoPlanner behavior, when a parent rule is fired only if produced child
> >> node has compatible trait set.
> >>
> >> I do not know Calcite's architecture that good but at first glance, the
> >> proposed ability to re-fire rules of a specific Rel seems good enough. It
> >> doesn't expand search space, because no new nodes are created, and it seems
> >> to be relatively simple to implement.
> >>
> >> [1]
> >> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> >> [2]
> >> https://gith

Re: [DISCUSS] On-demand traitset request

2019-11-01 Thread Jinfeng Ni
Hi Xiening,

"Let say if R and S doesn’t have sorting properties at all. In your
case, we would end up adding enforcers for LHS and RHS to get
collation (a, b, c). Then we would need another enforcer to get
collation (b, c). This is a sub optimal plan as we could have use (b,
c, a) for join."

In such case, for step 2 when MergeJoin request a permutation match of
(a, b,c) on both it's input, it is not necessary to end up with
collation (a, b, c) only. Since it request "permutation", MJ could ask
all possible satisfying collations, which include (b, c, a). In other
words, the steps I described did not exclude such plan.

You may argue it would increase the search space. However,  by
limiting the search space, without explore all possible choice, we may
lose the chance getting 'optimal' plan we want.  For instance, in the
above example, the idea of passing "on demand" trait request (b,c)
from Agg to MJ is to avoid unnecessary sort (b,c).  In cases where the
join condition has good filtering, and such sort of join output could
be quite cheap.  Yet in the plan enumeration, since we use "on demand"
trait request from parent to guide the actions of MJ, I'm not sure if
we may restrict the choices we consider in the legs of join, whose
cardinality could be larger and play a bigger role in the overall
cost.

In other words, by using "on demand" trait request, we may restrict
the choices of plan, possibly in the some operators with larger data
size.

In the current implementation of VolcanoPlanner, I feel the root issue
of long planning time is not to explore all possible satisfying trait.
It is actually the unnecessary of AbstractConverter, added to the
equivalence class.


On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai  wrote:
>
> Thanks for the sharing. I like the way you model this problem, Jinfeng.
>
> There’s one minor issue with your example. Let say if R and S doesn’t have 
> sorting properties at all. In your case, we would end up adding enforcers for 
> LHS and RHS to get collation (a, b, c). Then we would need another enforcer 
> to get collation (b, c). This is a sub optimal plan as we could have use (b, 
> c, a) for join.
>
> I think in step #2, the join operator would need to take the agg trait 
> requirement into account. Then it would have two options -
>
> 1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to 
> guarantee the join output would deliver the collation agg needs.
> 2) require permutation match of (a, b, c); in such case, an enforcer might be 
> needed for aggregation.
>
> Eventually the cost model decides who is the winner.
>
> There’s a fundamental difference between your model and Haisheng’s proposal. 
> In Haisheng’s case, a rel node not only looks at its parent’s requirement, 
> but also tries to get the potential traits its input could deliver. It would 
> try to align them to eliminate unnecessary alternatives.
>
> In above example, assuming R is (b, c, a) and S is (a, b, c), to implement 
> option 1), we would generate two alternatives -
>
> MergeJoin (b, c, a)
> TableScan R
> Sort(b, c, a)
> TableScan S
>
> MergeJoin(c, b, a)
> Sort(c, b, a)
> TableScan R
> Sort(c, b, a)
> TableScan S
>
> But if we look at the input traits and has the insight that R already 
> delivers (b, c, a), we could decide to require (b, c, a) only and avoid 
> generating the 2nd plan, which is definitely worse, and reduce the search 
> space.
>
>
> > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni  wrote:
> >
> > A little bit of history.  In Drill,  when we first implemented
> > Distribution trait's definition,  we allows both exact match and
> > partial match in satisfy() method. This works fine for single-input
> > operator such aggregation, however it leads to incorrect plan for join
> > query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
> > time, we removed partial match, and use exact match only. Yet this
> > changes leads to unnecessary additional exchange.  To mitigate this
> > problem, in join physical operator, for a join key (a, b, c),  we
> > enumerate different distribution requests, yet this lead to more space
> > to explore and significantly increase planning time (which is probably
> > what Haisheng also experienced).  When I look back, I feel probably
> > what we miss is the "coordination" step in the join operator, because
> > if we relax the requirement of satisfy(), for multi-input operators,
> > we have to enforce some "coordination", to make sure multiple input's
> > trait could work together properly.
> >
> >
> >
> > On Fri, Oct 18, 2019 at 4:38 PM Jinfe

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Jinfeng Ni
Hi Vladimir,

The SubsetTransformer interface and the iterating over the RelNodes
within a RelSubset in Drill  is exactly implemented to do the trait
propagation. We also had to rely on AbstractConverter to fire
necessary rule to avoid the CanNotPlan issue. At some point, Calcite
community chooses to remove AbstractConverter and Drill had to add it
back, which is probably one of the main reasons for us to continue
using a Calcite fork.  I still remember we constantly had to deal with
the dilemma between "CanNotPlan" and long planing time due to explored
search space.

Glad to see more people are joining the effort to solve this long
overdue issue, something missing in Calcite's core optimizer framework
"since before Calcite was Calcite" (Jacques's words).

Jinfeng


On Thu, Oct 31, 2019 at 3:38 AM Vladimir Ozerov  wrote:
>
> Hi Danny,
>
> Thank you very much for the links. What is described here is pretty much
> similar to the problem I describe. Especially the discussion about trait
> propagation, as this is basically what I need - to explore potential traits
> of children before optimizing parents. And this is basically what Drill
> already does with it's SubsetTransformer:
> 1) There is a SubsetTransformer interface, which iterates over physical
> relations of the given subset [1]
> 2) If you want to make a physical project, you iterate over physical
> relations of the input subset and create possible physical projects [2]
> 3) But if you cannot find any physical input, then you trigger creation of
> a "bad" physical project, which is very likely to have poor cost because it
> cannot take advantage of input's distribution information [3]
> So, step 2 - is a trait set propagation which is needed by many
> distributed engines. Step 3 - an attempt to workaround current
> VolcanoPlanner behavior, when a parent rule is fired only if produced child
> node has compatible trait set.
>
> I do not know Calcite's architecture that good but at first glance, the
> proposed ability to re-fire rules of a specific Rel seems good enough. It
> doesn't expand search space, because no new nodes are created, and it seems
> to be relatively simple to implement.
>
> [1]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> [2]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
> [3]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
>
> чт, 31 окт. 2019 г. в 12:21, Danny Chan :
>
> > Thanks Vladimir for bringing up this discussion !
> >
> > Did you notice that there is a JIRA issue about this problem ? [1] Also a
> > discussion about how to propagate the traits [2]
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > [2]
> > https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
> >
> > Best,
> > Danny Chan
> > 在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
> > > Hi colleagues,
> > >
> > > I would like to discuss with the community the possibility of adding a
> > new
> > > public method to VolcanoPlanner which will forcefully re-trigger the
> > rules
> > > for the specific rel. This is a follow up of a discussion started in the
> > > other thread [1].
> > >
> > > **Problem statement**
> > > When converting between conventions during optimization VolcanoPlanner
> > > prefers the top-bottom approach, so that the nodes are converted from the
> > > root. But in some cases, the intermediate node must be converted after
> > its
> > > children. This is especially true for distributed SQL engines, which rely
> > > on distribution traits during the optimization process: it is not
> > possible
> > > to efficiently choose a proper physical implementation of a parent node
> > > unless the physical representation of a child node is known.
> > >
> > > It seems that presently VolcanoPlanner cannot address such cases by
> > > default. Consider that we have two nodes and associated rules which
> > convert
> > > them to physical counterparts:
> > > [LogicalParent <- LogicalChild]
> > > The parent should be converted after the child. When the child is
> > > converted, the physical node is created:
> > > [LogicalParent <- {LogicalChild, PhysicalChild}]
> > > In order to finish the optimization process, we need to convert the
> > parent.
> > > But parent rules are not fired, because PhysicalChild has traits
> > > incompatible with LogicalParent.
> > >
> > > Presently the problem could be solved in two ways:
> > > 1) Always produce conversions when going top-down. In this case, I first
> > > create a physical parent, then a physical child. The problem is that
> > > created parent is not optimal because it didn't know child distribution
> > at
> > > the time of creation. So the full flow would be: create 

Re: Re: [DISCUSS] On-demand traitset request

2019-10-18 Thread Jinfeng Ni
A little bit of history.  In Drill,  when we first implemented
Distribution trait's definition,  we allows both exact match and
partial match in satisfy() method. This works fine for single-input
operator such aggregation, however it leads to incorrect plan for join
query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
time, we removed partial match, and use exact match only. Yet this
changes leads to unnecessary additional exchange.  To mitigate this
problem, in join physical operator, for a join key (a, b, c),  we
enumerate different distribution requests, yet this lead to more space
to explore and significantly increase planning time (which is probably
what Haisheng also experienced).  When I look back, I feel probably
what we miss is the "coordination" step in the join operator, because
if we relax the requirement of satisfy(), for multi-input operators,
we have to enforce some "coordination", to make sure multiple input's
trait could work together properly.



On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni  wrote:
>
> This is an interesting topic. Thanks for bringing up this issue.
>
> My understanding of Volcano planner is it works in a top-down search
> mode (the parent asks for certain trait of its child), while the trait
> propagates in a bottom-up way, as Stamatis explained.
>
> IMHO, the issue comes down to the definition of RelTrait, how to
> determine if a trait A could satisfy a request asking for trait B,
> that is, how RelTrait.satisfies() method is implemented.
>
> Let's first clarify different situations, using collation as example.
> 1) The collation is requested by query's outmost ORDER BY clause.
>- The generated plan has to have "exact match", i.e same collation
> (same column sequence), or "super match" .
> exact match:   (a, b)  satisfy  (a, b)
> super match:   (a, b, c)  satisfy (a, b)
>
> 2) The collation is requested by operand with single input, such as
> sort-based Aggregation.
>- In such case, a "permutation match" is sufficient.
> For instance,  for Aggregation (b,c),  input with collation (c, b)
> could satisfy the requirement.
> permutation match:  (b, c) satisfy (c, b). (c, b) satisfy (c, b)
> permutation match:  (b, c, a) satisfy (c, b). (c, b, a) satisfy (c, b)
>
> 3) The collation is requested by operand with >= 2 inputs, such as
> sort-based MergeJoin.
>   - A permutation match is sufficient for each input
>   - MergeJoin has to do coordination, after input's trait propagates
> upwards. In other words,  ensure both inputs's permutation match are
> actually same sequence. Otherwise,  enforcer could be inserted upon
> each input, and the planner generates two plans and let the cost
> decide.
>
> For the first case, this is how today's RelCollation's satisfy()
> method is implemented.
>
> For the second / third cases, use Haisheng's example,
>
> SELECT DISTINCT c, b FROM
>   ( SELECT R.c c, S.b b FROM R, S
> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>
> Aggregate . (c, b)
> +--- MergeJoin . (a, b, c)
> |--- TableScan on R
> +--- TableScan on S
>
> Here is the steps that might take place in the planner:
>
> 1) Aggregate request permutation match collation (c, b)
> 2) MergeJoin request a permutation match of (a, b,c) on both it's input
> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS 
> requirement
> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS 
> requirement
> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
>MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
> collation of (c, b, a)
>MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
> collation of (b, c, a)
> 6) MJ1 and MJ2 could both satisfy  permutation match request in step
> 1, leading to two possible plans:
>   Agg1:  with input of MJ1
>   Agg2:  with input of MJ2
> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
>
> I should point that the enforcer sort inserted in step 5 could help
> remove redundant sort in its input, if the input's collation is
> obtained from sort, by invoking Calcite's SortRemove Rule.
>
> The above only considers the column sequence. The DESC/ASC, NULL
> FIRST/LAST will add more complexity, but we probably use similar idea.
>
> In summary,  we need :
>   1) redefine collation trait's satisfy() policy,  exact match, super
> match, permutation match,
>   2) different physical operator applies different trait matching
> policy, depending on operator's # of inputs, and algorithm
> implementation.
>
>
>
>
>
> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan  wrote:
> >
> > Hi St

Re: Re: [DISCUSS] On-demand traitset request

2019-10-18 Thread Jinfeng Ni
This is an interesting topic. Thanks for bringing up this issue.

My understanding of Volcano planner is it works in a top-down search
mode (the parent asks for certain trait of its child), while the trait
propagates in a bottom-up way, as Stamatis explained.

IMHO, the issue comes down to the definition of RelTrait, how to
determine if a trait A could satisfy a request asking for trait B,
that is, how RelTrait.satisfies() method is implemented.

Let's first clarify different situations, using collation as example.
1) The collation is requested by query's outmost ORDER BY clause.
   - The generated plan has to have "exact match", i.e same collation
(same column sequence), or "super match" .
exact match:   (a, b)  satisfy  (a, b)
super match:   (a, b, c)  satisfy (a, b)

2) The collation is requested by operand with single input, such as
sort-based Aggregation.
   - In such case, a "permutation match" is sufficient.
For instance,  for Aggregation (b,c),  input with collation (c, b)
could satisfy the requirement.
permutation match:  (b, c) satisfy (c, b). (c, b) satisfy (c, b)
permutation match:  (b, c, a) satisfy (c, b). (c, b, a) satisfy (c, b)

3) The collation is requested by operand with >= 2 inputs, such as
sort-based MergeJoin.
  - A permutation match is sufficient for each input
  - MergeJoin has to do coordination, after input's trait propagates
upwards. In other words,  ensure both inputs's permutation match are
actually same sequence. Otherwise,  enforcer could be inserted upon
each input, and the planner generates two plans and let the cost
decide.

For the first case, this is how today's RelCollation's satisfy()
method is implemented.

For the second / third cases, use Haisheng's example,

SELECT DISTINCT c, b FROM
  ( SELECT R.c c, S.b b FROM R, S
WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;

Aggregate . (c, b)
+--- MergeJoin . (a, b, c)
|--- TableScan on R
+--- TableScan on S

Here is the steps that might take place in the planner:

1) Aggregate request permutation match collation (c, b)
2) MergeJoin request a permutation match of (a, b,c) on both it's input
3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS requirement
4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS requirement
5) MergeJoin do a coordination o LHS, RHS, and generate two possible plans
   MJ1:   Insert a sort of (c, b, a) on RHS.  This MJ operator now has
collation of (c, b, a)
   MJ2:   Insert a sort of (b, c, a) on LHS.  This MJ operator now has
collation of (b, c, a)
6) MJ1 and MJ2 could both satisfy  permutation match request in step
1, leading to two possible plans:
  Agg1:  with input of MJ1
  Agg2:  with input of MJ2
7) planner chooses a best plan based on cost of Agg1 and Agg2.

I should point that the enforcer sort inserted in step 5 could help
remove redundant sort in its input, if the input's collation is
obtained from sort, by invoking Calcite's SortRemove Rule.

The above only considers the column sequence. The DESC/ASC, NULL
FIRST/LAST will add more complexity, but we probably use similar idea.

In summary,  we need :
  1) redefine collation trait's satisfy() policy,  exact match, super
match, permutation match,
  2) different physical operator applies different trait matching
policy, depending on operator's # of inputs, and algorithm
implementation.





On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan  wrote:
>
> Hi Stamatis,
>
> Thanks for your comment. I think my example didn't make it clear.
>
> When a logical operator is created, it doesn't have any physical,
> propertyand it shouldn't have. When a physical operator is created,
> e.g. in Enumerable convention, it only creates an intuitive traitset
> with it, and requests it children the corresponding ones.
>
> For operators such as Join, Aggregate, Window, which may deliver
> multiple different traitsets, when the parent operator is created and
> request its traitset, it might be good to know what are the poosible
> traitset that the child operator can deliver. e.g.
>
> SELECT DISTINCT c, b FROM
>   ( SELECT R.c c, S.b b FROM R, S
> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
>
> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> Here is the logical plan:
> Aggregate
> +--- InnerJoin
> |--- TableScan on R
> +--- TableScan on S
>
> When we create a physical merge join for the inner join, it may just
> have collation sorted on a,b,c. Then the aggreate on top of join will
> request another sort on c,b, thus we miss the best plan. What we
> can do is requesting all the order combinations, which is n!, like
> how the Values operator does. But that is too much.
>
> If we can provide an approach that can minimize the possiple traitset
> that the child operator may deliver, we can reduce the chance of missing
> good plans. For the above query, the Aggregate operator can derive
> possible traitsets that its child 

Re: Re-ordering AND clauses

2018-08-08 Thread Jinfeng Ni
SQL standard seems to allow the re-ordering of AND clauses; basically, it's
"implementation-dependent".

In Drill, the run-time will re-order the branches, regardless whether
planner does the re-order or not. It's mainly for performance benefit;  "
exp1 AND expr2" would be reordered into "expr2 AND expr1", if expr1 is more
expensive to evaluate (i.e, containing some functions which is expensive to
evaluate).  Certainly such re-order would lead to different behavior for
some cases.


1.
https://stackoverflow.com/questions/789231/is-the-sql-where-clause-short-circuit-evaluated



On Wed, Aug 8, 2018 at 2:07 PM, Julian Hyde  wrote:

> We have never really made our policy clear about whether it is valid for
> the planner to re-order the clause of an AND. I would like to have a
> discussion about that policy (also see https://issues.apache.org/
> jira/browse/CALCITE-2450  jira/browse/CALCITE-2450>).  I propose that we can re-order the clauses
> of AND (and OR) but not CASE.
>
> Consider the query Q1:
>
>   SELECT *
>   FROM t
>   WHERE x > 0 AND y / x < 5
>
> And the similar query Q2 that re-orders the AND clause:
>
>   SELECT *
>   FROM t
>   WHERE y / x < 5 AND x > 0
>
> If one of the rows has a x = 0, we would expect Q2 to throw a
> divide-by-zero error. Is it allowed for Q1 to throw? Is it allowed for it
> NOT to throw?
>
> We recognized that sometimes people want to write SQL to guard against bad
> values (like x = 0 above), and so we tacitly assumed that we would not
> re-order AND. Thus in current Calcite, Q1 would never throw, and Q2 would
> always throw.
>
> I think that was a mistake. It ties our hands too much (we are not able to
> move highly selective predicates to the front of the list, for instance)
> and it is inconsistent with SQL semantics.
>
> There is a way to achieve the “guarding” behavior: use CASE (whose clauses
> cannot be re-ordered), in Q3 as follows:
>
>   SELECT *
>   FROM t
>   WHERE CASE WHEN x > 0 THEN y / x < 5 ELSE FALSE END
>
> Julian
>
>


Re: [ANNOUNCE] New committer: Zhiqiang He

2017-06-09 Thread Jinfeng Ni
Congratulations, Zhiqiang!


On Fri, Jun 9, 2017 at 1:00 PM, Jacques Nadeau  wrote:

> Congrats, welcome!
>
> On Fri, Jun 9, 2017 at 12:57 PM, Eli Levine  wrote:
>
> > Congrats, Ransom! Keep up the good work.
> >
> > Eli
> >
> > On Fri, Jun 9, 2017 at 11:49 AM, Julian Hyde  wrote:
> > > Apache Calcite's Project Management Committee (PMC) has invited
> > > Zhiqiang He (Ransom) to become a committer, and we are pleased to
> > > announce that he has accepted.
> > >
> > > He has been quietly but steadily making contributions to add a
> > > MATCH_RECOGNIZE clause our SQL dialect, which ultimately will bring
> > > CEP (complex-event processing) capabilities to Flink and any other
> > > streaming engines that use Calcite SQL. He figured out how to add a
> > > major extension to SQL with minimal input, and his code contributions
> > > have been of high quality.
> > >
> > > Ransom,
> > >
> > > Welcome, thank you for your contributions, and we look forward your
> > > further interactions with the community!
> > >
> > > If you wish, please feel free to tell us more about yourself and what
> > > you are working on. Also, please let us know what name you would like
> > > us to address you by (Ransom, or something else).
> > >
> > > Julian (on behalf of the Apache Calcite PMC)
> >
>


Re: ways to transform a sql query input value

2017-03-27 Thread Jinfeng Ni
If you do want to have a planner rule to do such transformation,
consider putting the rule in a HepPlanner in stead of VolcanoPlanner,
if you have not tried that approach.


On Mon, Mar 27, 2017 at 8:50 AM, Julian Hyde  wrote:
> It’s problematic doing such a transformation in a rule. Why? Rules are 
> supposed to preserve semantics, i.e. convert a RelNode to one that produces 
> the same result. If a rule converts A to B then the engine is at liberty to 
> choose A or B (or something derived from A, or something derived from B), 
> whichever is cheaper. And that’s valid because they all give the same result, 
> right?!
>
> Try this: write your own RelShuttle (RelShuttleImpl and RelHomogeneousShuttle 
> are useful partial implementations) and apply it soon after Sql-to-rel 
> translation.
>
> Julian
>
>
>
>
>
>> On Mar 24, 2017, at 7:16 PM, Jingwu Li  wrote:
>>
>> Hi all,
>>
>>   Is there a way to do value transformation for a sql query? For 
>> example, select * from user where user_id = 1234 need to transform to select 
>> * from user where user_id = 5678.
>>
>>   I have tried to create a planner rule similar to JdbcFilterRule which 
>> match the LogicalFilter and created a value transformer visitor to do the 
>> value transformation. But after the optimization phase, ether my cutomized 
>> transformation relnode does not get selected due to the cost is not cheapest 
>> or the transformation has applied multiple times thus the value got 
>> transformed to unexpected value ( 1234 -> 5678 -> 9012 ...). Is this a right 
>> place to do such value transformation?
>>
>>  I also looked at the query parsing phase and try to find a place to get 
>> the parsed sqlnode tree.  It seems hard to add a visitor to do the 
>> transformation when traversal the tree.
>>
>>  Could you give some thoughts?
>>
>>
>> Regards,
>>
>> Jingwu Li
>>
>>
>>
>


Re: JoinToMultiJoinRule usage

2017-03-22 Thread Jinfeng Ni
That's correct. It's LoptOptimizeJoinRule who is enumerating different
join order. JoinToMultiJoinRule is a prerequisite step for
LoptOptimizeJoinRule.

On Wed, Mar 22, 2017 at 4:35 AM, weijie tong  wrote:
> I think , this transformation is done by two rules. First
> ,JoinToMultiJoinRule translates the joins to MultiJoin ,then
> LoptOptimizeJoinRule do the actual transformation by matching the MultiJoin
> RelNode created last step.
>
> On Wed, Mar 22, 2017 at 1:20 AM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>> Hi all,
>>
>> Drill uses DRILL_JOIN_TO_MULTIJOIN_RULE (=new
>> JoinToMultiJoinRule(DrillJoinRel.class) during planning stage. I have
>> noticed that after using this rule, left join can be transformed into right
>> join when right table is greater than left one. After reading
>> JoinToMultiJoinRule java doc, I am not quite sure this rule actually does
>> such transformation but if I disable it, my query is not transformed. Can
>> anybody point me out where this rule applies such logic? Or suggest what
>> can be responsible for transforming left join into right one?
>>
>>
>> Thank you in advance.
>>
>> Kind regards
>> Arina
>>


Re: Ambiguous dynamic star

2016-07-26 Thread Jinfeng Ni
@Julian,

The new test added to test checkAmbiguousUnresolvedStar() looks good
to me. I forgot to add such testing, when working on Calcite-1150,
though Drill has unit test for such case.

One more test we may consider adding.  When resolve column reference,
regular field has higher priority than dynamic star columns. (I'm not
completely sure if we should allow such case, or block it, since this
column reference may be resolved to the regular field, or a field
expanded later on from * )

@Test public void testDynamicStar2() throws Exception {
  final String sql = "select newid\n"
  + "from (select *, NATION.N_NATION + 100 as newid  from
\"DYNAMIC\".NATION, \"DYNAMIC\".CUSTOMER)";
  sql(sql).type("RecordType(ANY NEWID) NOT NULL");
}

Thanks again for adding those test.



On Tue, Jul 26, 2016 at 1:40 AM, Julian Hyde  wrote:
> Jinfeng,
>
> Here’s a patch that attempts to test checkAmbiguousUnresolvedStar. Let me 
> know whether I’ve missed any coverage.
>
> Julian
>
>
> diff --git 
> a/core/src/test/java/org/apache/calcite/test/MockCatalogReader.java 
> b/core/src/test/java/org/apache/calcite/test/MockCatalogReader.java
> index e56dd96..2c791cf 100644
> --- a/core/src/test/java/org/apache/calcite/test/MockCatalogReader.java
> +++ b/core/src/test/java/org/apache/calcite/test/MockCatalogReader.java
> @@ -233,6 +233,20 @@ public MockCatalogReader init() {
>  contactAddressTable.addColumn("MAILING_ADDRESS", addressType);
>  registerTable(contactAddressTable);
>
> +// Register "DYNAMIC" schema.
> +MockSchema dynamicSchema = new MockSchema("DYNAMIC");
> +registerSchema(dynamicSchema);
> +
> +MockTable nationTable =
> +new MockDynamicTable(this, dynamicSchema.getCatalogName(),
> +dynamicSchema.getName(), "NATION", false, 100);
> +registerTable(nationTable);
> +
> +MockTable customerTable =
> +new MockDynamicTable(this, dynamicSchema.getCatalogName(),
> +dynamicSchema.getName(), "CUSTOMER", false, 100);
> +registerTable(customerTable);
> +
>  // Register "CUSTOMER" schema.
>  MockSchema customerSchema = new MockSchema("CUSTOMER");
>  registerSchema(customerSchema);
> diff --git a/core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java 
> b/core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
> index 57f71d1..c095453 100644
> --- a/core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
> +++ b/core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
> @@ -,6 +,37 @@ public void _testValuesWithAggFuncs() {
>  //   12345678901234567890123456789012345678901234567890123456789012345
>  //check("SELECT count(0) FROM emp GROUP BY ()");
>}
> +
> +
> +  /** Test case for
> +   *  href="https://issues.apache.org/jira/browse/CALCITE-1150;>[CALCITE-1150]
> +   * Dynamic Table / Dynamic Star support. */
> +  @Test public void testDynamicStar() throws Exception {
> +final String sql = "select n.n_nation\n"
> ++ "from (select * from \"DYNAMIC\".NATION) as n,\n"
> ++ " (select * from \"DYNAMIC\".CUSTOMER)";
> +sql(sql).type("RecordType(ANY N_NATION) NOT NULL");
> +  }
> +
> +  @Test public void testAmbiguousDynamicStar() throws Exception {
> +final String sql = "select ^n_nation^\n"
> ++ "from (select * from \"DYNAMIC\".NATION),\n"
> ++ " (select * from \"DYNAMIC\".CUSTOMER)";
> +sql(sql).fails("Column 'N_NATION' is ambiguous");
> +  }
> +
> +  @Test public void testAmbiguousDynamicStar2() throws Exception {
> +final String sql = "select ^n_nation^\n"
> ++ "from (select * from \"DYNAMIC\".NATION, \"DYNAMIC\".CUSTOMER)";
> +sql(sql).fails("Column 'N_NATION' is ambiguous");
> +  }
> +
> +  @Test public void testAmbiguousDynamicStar3() throws Exception {
> +final String sql = "select ^nc.n_nation^\n"
> ++ "from (select * from \"DYNAMIC\".NATION, \"DYNAMIC\".CUSTOMER) as 
> nc";
> +sql(sql).fails("Column 'N_NATION' is ambiguous");
> +  }
> +
>  }
>
>  // End SqlValidatorTest.java
>
>
>
>> On Jul 26, 2016, at 1:05 AM, Julian Hyde  wrote:
>>
>> Jinfeng,
>>
>> In https://issues.apache.org/jira/browse/CALCITE-1150 you added dynamic 
>> record types, and the method checkAmbiguousUnresolvedStar is supposed to 
>> throw “Column ‘X’ is ambiguous” if more than one field has a dynamic star.
>>
>> As part of https://issues.apache.org/jira/browse/CALCITE-1208 I am 
>> revisiting that code and doing some major refactoring.
>>
>> Did you add any tests where checkAmbiguousUnresolvedStar throws that 
>> particular error? I can’t find any; in fact, if I remove all calls to that 
>> method, the test suite still passes.
>>
>> Can you suggest some tests for that functionality? I want to make sure I’m 
>> not breaking what you did.
>>
>> Julian
>>
>>
>>
>


[jira] [Created] (CALCITE-1296) Different classes of datetime should be able to compare.

2016-06-16 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created CALCITE-1296:
---

 Summary: Different classes of datetime should be able to compare.
 Key: CALCITE-1296
 URL: https://issues.apache.org/jira/browse/CALCITE-1296
 Project: Calcite
  Issue Type: Improvement
Reporter: Jinfeng Ni
Assignee: Julian Hyde


This is follow-up from a discussion in DRILL-4525.  Currently, Calcite does not 
allow the comparison between date vs timestamp;  LHS and RHS have to have the 
same type. 

{code}
select CAST('1990-01-01' AS DATE) < CAST('2001-01-01' AS TIMESTAMP) FROM 
(VALUES(1, 2)) AS T(A,B);
Mar 24, 2016 8:15:53 AM 
org.apache.calcite.sql.validate.SqlValidatorException 
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply 
'<' to arguments of type ' < <TIMESTAMP(0)>'. Supported form(s): 
' < '
{code}

This behavior is different from Oracle and Postgres. Both of them allow 
implicit cast between date and timestamp, and hence allow the comparison. 

It seems to make sense to allow different classes of datetime to be comparable 
in Calcite. 

Oracle:
{code}
select count(*) from dual
where Date '1990-01-01'  < TIMESTAMP '1990-01-01 00:01:02';
  2
  COUNT(*)
--
 1
{code}

Postgres:
{code}
select CAST('1990-01-01' AS DATE) < CAST('2001-01-01' AS TIMESTAMP) FROM 
(VALUES(1, 2)) AS T(A,B);
 ?column?
--
 t
(1 row)
{code}

In particular,  Oracle doc has the following description [1]. 
"
Datetime Comparisons
When you compare date and timestamp values, Oracle converts the data to the 
more precise datatype before doing the comparison. For example, if you compare 
data of TIMESTAMP WITH TIME ZONE datatype with data of TIMESTAMP datatype, 
Oracle converts the TIMESTAMP data to TIMESTAMP WITH TIME ZONE, using the 
session time zone.

The order of precedence for converting date and timestamp data is as follows:

1. DATE
2. TIMESTAMP
3. TIMESTAMP WITH LOCAL TIME ZONE
4. TIMESTAMP WITH TIME ZONE
For any pair of datatypes, Oracle converts the datatype that has a smaller 
number in the preceding list to the datatype with the larger number.
"

[1] 
https://docs.oracle.com/cd/B19306_01/server.102/b14225/ch4datetime.htm#i1006333



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: How to reduce constant expression in SQL

2016-06-14 Thread Jinfeng Ni
If you put ReduceExpressionsRule in a HepPlanner, then you do not have
to rely the costing factor,  since HepPlanner is heuristic-rule based,
not cost-based.

Actually, Calcite has one unit test in RelOptRuleTest [1] [2], which
shows how constant reduce rules work.

[1] 
https://github.com/apache/calcite/blob/master/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L834
[2] 
https://github.com/apache/calcite/blob/master/core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml#L644-L667


On Tue, Jun 14, 2016 at 10:10 PM, Jason Altekruse  wrote:
> The first place I would recommend looking is how you are costing projects
> and filters. If the costing function for these relational operators does
> not take into account the complexity of the expressions contained, possibly
> by only considering the number of top level expressions, the reduced plan
> would not be exposing with a lower cost that it is preferable to the
> original plan.
>
> You can see here that the default project Rel appears to only consider the
> number of expressions and the input size when computing costs [1]. You
> probably want to have some visitor that will walk each of the expressions
> to come up with a more accurate cost. I don't know if this functionality is
> provided somewhere else in calcite.
>
> [1] -
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/core/Project.java#L216
>
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
>
> On Tue, Jun 14, 2016 at 9:58 PM, Jark Wu  wrote:
>
>> Hi all,
>>
>> I want to reduce the constant expression in sql, replacing them with a
>> corresponding constant.
>>
>> For instance I have a sql query :
>>
>> SELECT a, b, c+(2+3) as d FROM Orders WHERE b > (1+4)
>>
>>
>> I want some planner rules can optimize this query to reduce the constant
>> exprssions:
>>
>> SELECT a, b, c+5 as d FROM Orders WHERE b > 5
>>
>>
>> It seems that ReduceExpressionsRule is what I need, and I tried it many
>> times but it didn’t work.
>>
>> Is the way I used wrong ? Or there is any exist rule to use, or I have to
>> implement one myself ?
>>
>>
>>
>> - Jark Wu
>>
>>


Re: Simple quoted identifier : column name vs implicit function call ?

2016-05-25 Thread Jinfeng Ni
Opened Calcite-1256.

https://issues.apache.org/jira/browse/CALCITE-1256

On Tue, May 24, 2016 at 10:05 PM, Julian Hyde <jh...@apache.org> wrote:
> Agreed. Log a JIRA.
>
>> On May 24, 2016, at 9:14 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>>
>> You are right that Oracle is case-sensitive.  Tried with column named
>> as "USER".  The quoted identifier "user" would raise "invalid
>> identifier" error, "USER" would match the column. This seems to show
>> Oracle tries to match quoted identifier against columns in the table,
>> not match for the system function.
>>
>> Oracle:
>> create table mytemp ("USER" int);
>>
>> select "user" from mytemp;
>> select "user" from mytemp
>>   *
>> ERROR at line 1:
>> ORA-00904: "user": invalid identifier
>>
>> select "USER" from mytemp;
>>
>>  USER
>> --
>>   100
>>
>> For Calcite, I used Calcite master branch and run the query in
>> Calcite's sqlline.
>>
>> select "user" from (values(100)) as T("USER");
>>
>> ++
>> |
>> +-
>> | sa
>> +-
>>
>> I'm going to open a JIRA.
>>
>> On Tue, May 24, 2016 at 8:54 PM, Julian Hyde <jh...@apache.org> wrote:
>>> Sounds reasonable.
>>>
>>> However, can you run the Oracle tests again to make sure. Oracle is 
>>> case-sensitive when identifiers are quoted, so it would not consider a 
>>> “user” column to be a match for the “USER” function. Try instead with a 
>>> column called “USER” and see if you get the same results.
>>>
>>> By a similar argument, I deduce that you are trying Calcite-in-Drill (case 
>>> insensitive) rather than raw Calcite (case sensitive).
>>>
>>> Then yes, open a JIRA.
>>>
>>> Julian
>>>
>>>
>>>> On May 24, 2016, at 3:50 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>>>>
>>>> This question is raised in Drill's user list [1]. The code logic is
>>>> actually in Calcite.
>>>>
>>>> Basically, SQL standard allows couple of reserved identifiers used for
>>>> system function call, such as USER, CURRENT_USER, CURRENT_TIME etc.
>>>> If someone wants to use those reserved names as column names, he has
>>>> to use quoted identifier.
>>>>
>>>> However, looks like Calcite always interprets those simple quoted
>>>> identifiers as a system function call, in stead of column name.  Such
>>>> behavior is different from Postgres/Oracle, which will interpret a
>>>> quoted identifier as column name, instead of system function call).
>>>>
>>>> I would argue that Postgres/Oracle's behavior makes more sense. If
>>>> someone quotes an reserved word, the expectation is he can use those
>>>> reserved words just like a regular identifier.
>>>>
>>>> If this sounds reasonable, I'll open a JIRA.
>>>>
>>>> -
>>>>
>>>> Oracle:
>>>>
>>>> create table mytemp("user" int);
>>>> insert into mytemp values(100);
>>>>
>>>> SQL> select user from mytemp;
>>>>
>>>> USER
>>>> --
>>>> user_id
>>>>
>>>> SQL> select "user" from mytemp;
>>>>
>>>> user
>>>> --
>>>>  100
>>>>
>>>> SQL> select mytemp."user" from mytemp;
>>>>
>>>> user
>>>> --
>>>>  100
>>>>
>>>>
>>>> Postgres:
>>>> select user from (values(100)) as T("user");
>>>> current_user
>>>> --
>>>> user_id
>>>> (1 row)
>>>>
>>>> mydb=# select "user" from (values(100)) as T("user");
>>>> user
>>>> --
>>>> 100
>>>> (1 row)
>>>>
>>>> mydb=# select T."user" from (values(100)) as T("user");
>>>> user
>>>> --
>>>> 100
>>>> (1 row)
>>>>
>>>>
>>>> Calcite:
>>>> select user from (values(100)) as T("user");
>>>> -
>>>> user_id
>>>>
>>>> select "user" from (values(100)) as T("user");
>>>> -
>>>> user_id
>>>>
>>>> select T."user" from (values(100)) as T("user");
>>>> ++
>>>> |user|
>>>> ++
>>>> | 100|
>>>> ++
>>>>
>>>> [1] 
>>>> http://mail-archives.apache.org/mod_mbox/drill-user/201605.mbox/%3CCAKOFcwrR4m6_EZvKU0ijh5CeSBa-YvZxomBFH6dPmthj7g%2BmWg%40mail.gmail.com%3E
>>>
>


[jira] [Created] (CALCITE-1256) Quoted reserved identifier should match for column names

2016-05-25 Thread Jinfeng Ni (JIRA)
Jinfeng Ni created CALCITE-1256:
---

 Summary: Quoted reserved identifier should match for column names
 Key: CALCITE-1256
 URL: https://issues.apache.org/jira/browse/CALCITE-1256
 Project: Calcite
  Issue Type: Bug
Reporter: Jinfeng Ni
Assignee: Julian Hyde


SQL standard allows couple of reserved identifiers used for system function 
call, such as USER, CURRENT_USER, CURRENT_TIME etc. If someone wants to use 
those reserved names as column names, he has
to use quoted identifier.

However, Calcite always interprets those simple quoted identifiers as a system 
function call, in stead of column name.  Such behavior is different from 
Postgres/Oracle, which will interpret a quoted identifier as column name, 
instead of system function call.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Simple quoted identifier : column name vs implicit function call ?

2016-05-24 Thread Jinfeng Ni
You are right that Oracle is case-sensitive.  Tried with column named
as "USER".  The quoted identifier "user" would raise "invalid
identifier" error, "USER" would match the column. This seems to show
Oracle tries to match quoted identifier against columns in the table,
not match for the system function.

Oracle:
create table mytemp ("USER" int);

select "user" from mytemp;
select "user" from mytemp
   *
ERROR at line 1:
ORA-00904: "user": invalid identifier

select "USER" from mytemp;

  USER
--
   100

For Calcite, I used Calcite master branch and run the query in
Calcite's sqlline.

select "user" from (values(100)) as T("USER");

++
|
+-
| sa
+-

I'm going to open a JIRA.

On Tue, May 24, 2016 at 8:54 PM, Julian Hyde <jh...@apache.org> wrote:
> Sounds reasonable.
>
> However, can you run the Oracle tests again to make sure. Oracle is 
> case-sensitive when identifiers are quoted, so it would not consider a “user” 
> column to be a match for the “USER” function. Try instead with a column 
> called “USER” and see if you get the same results.
>
> By a similar argument, I deduce that you are trying Calcite-in-Drill (case 
> insensitive) rather than raw Calcite (case sensitive).
>
> Then yes, open a JIRA.
>
> Julian
>
>
>> On May 24, 2016, at 3:50 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>>
>> This question is raised in Drill's user list [1]. The code logic is
>> actually in Calcite.
>>
>> Basically, SQL standard allows couple of reserved identifiers used for
>> system function call, such as USER, CURRENT_USER, CURRENT_TIME etc.
>> If someone wants to use those reserved names as column names, he has
>> to use quoted identifier.
>>
>> However, looks like Calcite always interprets those simple quoted
>> identifiers as a system function call, in stead of column name.  Such
>> behavior is different from Postgres/Oracle, which will interpret a
>> quoted identifier as column name, instead of system function call).
>>
>> I would argue that Postgres/Oracle's behavior makes more sense. If
>> someone quotes an reserved word, the expectation is he can use those
>> reserved words just like a regular identifier.
>>
>> If this sounds reasonable, I'll open a JIRA.
>>
>> -
>>
>> Oracle:
>>
>> create table mytemp("user" int);
>> insert into mytemp values(100);
>>
>> SQL> select user from mytemp;
>>
>> USER
>> --
>> user_id
>>
>> SQL> select "user" from mytemp;
>>
>>  user
>> --
>>   100
>>
>> SQL> select mytemp."user" from mytemp;
>>
>>  user
>> --
>>   100
>>
>>
>> Postgres:
>> select user from (values(100)) as T("user");
>> current_user
>> --
>> user_id
>> (1 row)
>>
>> mydb=# select "user" from (values(100)) as T("user");
>> user
>> --
>>  100
>> (1 row)
>>
>> mydb=# select T."user" from (values(100)) as T("user");
>> user
>> --
>>  100
>> (1 row)
>>
>>
>> Calcite:
>> select user from (values(100)) as T("user");
>> -
>> user_id
>>
>> select "user" from (values(100)) as T("user");
>> -
>> user_id
>>
>> select T."user" from (values(100)) as T("user");
>> ++
>> |user|
>> ++
>> | 100|
>> ++
>>
>> [1] 
>> http://mail-archives.apache.org/mod_mbox/drill-user/201605.mbox/%3CCAKOFcwrR4m6_EZvKU0ijh5CeSBa-YvZxomBFH6dPmthj7g%2BmWg%40mail.gmail.com%3E
>


Re: [jira] [Commented] (CALCITE-1150) Create the a new DynamicRecordType, avoiding star expansion when working with this type

2016-04-21 Thread Jinfeng Ni
@Julian,

Your comment makes sense to me. I put a short doc describing the
expected behavior for this feature. The doc link is put in
CALCITE-1150. Please take a look.


Jinfeng


On Wed, Apr 20, 2016 at 1:06 PM, Julian Hyde (JIRA)  wrote:
>
> [ 
> https://issues.apache.org/jira/browse/CALCITE-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250601#comment-15250601
>  ]
>
> Julian Hyde commented on CALCITE-1150:
> --
>
> This feature will be useful, and as I have said to [~jnadeau] would fit well 
> within Calcite, but it’s a bit shapeless to me right now. I would like to see 
> some validator tests (any maybe one or two sql-to-rel converter tests) so I 
> get a feel for how it would work. When you have some tests can you post a 
> link to the github branch? I don’t think you should charge ahead with the 
> implementation because I might not agree with the specification (i.e. the 
> test cases).
>
>> Create the a new DynamicRecordType, avoiding star expansion when working 
>> with this type
>> ---
>>
>> Key: CALCITE-1150
>> URL: https://issues.apache.org/jira/browse/CALCITE-1150
>> Project: Calcite
>>  Issue Type: Improvement
>>Reporter: Jacques Nadeau
>>Assignee: Julian Hyde
>>
>> DynamicRecordType can be used to leverage user-provided field implications 
>> to avoid schema analysis until execution.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)


Calcite master branch has failed unit test?

2016-01-12 Thread Jinfeng Ni
On commit:

git log
commit 9f6f23de06791a3d2de1672b55bfdf7d6396bc78

Failed tests:
  SqlToRelConverterTest.testCorrelationNotExistsAndFilter:1692 plan
expected:<...alFilter(condition=[[NOT(IS NOT NULL($12)])])
  LogicalJoi...> but was:<...alFilter(condition=[[IS NULL($12])])
  LogicalJoi...>

Tests run: 2687, Failures: 1, Errors: 0, Skipped: 87

Anyone saw the same failure?

Regards,

Jinfeng


Re: Assertion check in RelOptTableImpl.create()

2016-01-11 Thread Jinfeng Ni
Whether Table implements
ScannableTable/QuerableTable/TranslatableTable would have impact of
RelOptTableImpl.toRel() method.
toRel() method will rely on that to return LogicalScan, or
EnumerableTableScan etc.

Although throw UnsupportedException would solve my current problem,
I'm not clear why those "free floating" tables have to implement the
interface, and the regular tables do not have to.  If there is no
other reason, I kindly agree with Jacques that we better remove the
assertion.



On Sun, Jan 10, 2016 at 7:47 PM, Jacques Nadeau <jacq...@apache.org> wrote:
> It seems better to remove the assertion. If those are marker interfaces
> that have more meaning, I think we should avoid forcing people to implement
> one of those interfaces and then not support the implied functionality.
> Hadoop had a similiar problem when they added ByteBuffer read apis marker
> interface and then added some implementations that had the marker interface
> but fail to implement the appropriate functionality. This means people can
> no longer rely on the marker interface to ensure that functionality is
> available. For those that are interested, this is the kind of hoops the
> Parquet library had to jump through because of this problem [1].
>
> [1]
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/CompatibilityUtil.java#L79
>
>
> On Sun, Jan 10, 2016 at 5:58 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> I don’t think it would do any harm if DrillTable implements
>> ScannableTable. The method could throw UnsupportedOperationException if you
>> don’t intend to use it.
>>
>> > On Jan 8, 2016, at 9:58 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>> >
>> > Thanks for the explanation, Julian.
>> >
>> > The places where I want to create an instance of RelOptTable is not in
>> > SqlValidator or SqlToRelConverter. Rather, it's in a RelOptRule, where
>> > CalciteSchema or Path is not available, which makes it not possible to
>> > call the other create() methods. In that sense, such table is "free
>> > floating" table.
>> >
>> > The RelOptRule, which essentially is trying to pushing partition
>> > filter into scan, has to create a new instance of RelOptTable, after
>> > certain transformation. In that sense, it's quite similar to
>> > DeltaTableScanRule in Calcite code base [1].  The difference is
>> > DeltaTableScanRule has a ScannableTable, while in Drill, DrillTable
>> > does not implement ScannableTable [2].
>> >
>> > Do you think it makes sense to make DrillTable implement ScannableTable?
>> >
>> > Thanks!
>> >
>> > [1]
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/stream/StreamRules.java#L192-L195
>> > [2]
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java#L34
>> >
>> > On Fri, Jan 8, 2016 at 5:24 PM, Julian Hyde <jh...@apache.org> wrote:
>> >> Tables created using “create(RelOptSchema, RelDataType, Table)” have
>> their names field set to the empty list. That is, they don’t know their
>> location within the schema hierarchy and in fact may not be in the schema
>> hierarchy. Usually a table implements itself by generating code like
>> >>
>> >>  rootSchema.getSubSchema(“FOODMART”).getTable(“CUSTOMERS”)
>> >>
>> >> But such “free floating” tables cannot implement themselves in that
>> way. Therefore this method is only for kinds of tables that know how to get
>> to their own data: TranslatableTable, ModifiableTable, ScannableTable.
>> >>
>> >> Julian
>> >>
>> >>
>> >>> On Jan 7, 2016, at 5:40 PM, Jinfeng Ni <j...@apache.org> wrote:
>> >>>
>> >>> Does anyone know why one of the static create() methods in
>> >>> RelOptTableImpl has the following assertion check (to check table is
>> >>> instance of TranslatableTable, or ScannableTable, or ModifiableTable)
>> >>> [1], while the rest of create() methods do not do such check? [2]
>> >>>
>> >>> Looks like RelOptTableImpl.toRel() actually expects table instance
>> >>> other than the above three class[3].
>> >>>
>> >>> Does it makes sense to remove the assertion check in [1]?
>> >>>
>> >>> Best Regards,
>> >>>
>> >>> Jinfeng
>> >>>
>> >>>
>> >>> [1].
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L167-L169
>> >>>
>> >>> [2]
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L118
>> >>>
>> >>> [3]
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L225
>> >>
>>
>>


Re: Assertion check in RelOptTableImpl.create()

2016-01-10 Thread Jinfeng Ni
Sounds good. Thanks!


On Sun, Jan 10, 2016 at 5:58 PM, Julian Hyde <jh...@apache.org> wrote:
> I don’t think it would do any harm if DrillTable implements ScannableTable. 
> The method could throw UnsupportedOperationException if you don’t intend to 
> use it.
>
>> On Jan 8, 2016, at 9:58 PM, Jinfeng Ni <jinfengn...@gmail.com> wrote:
>>
>> Thanks for the explanation, Julian.
>>
>> The places where I want to create an instance of RelOptTable is not in
>> SqlValidator or SqlToRelConverter. Rather, it's in a RelOptRule, where
>> CalciteSchema or Path is not available, which makes it not possible to
>> call the other create() methods. In that sense, such table is "free
>> floating" table.
>>
>> The RelOptRule, which essentially is trying to pushing partition
>> filter into scan, has to create a new instance of RelOptTable, after
>> certain transformation. In that sense, it's quite similar to
>> DeltaTableScanRule in Calcite code base [1].  The difference is
>> DeltaTableScanRule has a ScannableTable, while in Drill, DrillTable
>> does not implement ScannableTable [2].
>>
>> Do you think it makes sense to make DrillTable implement ScannableTable?
>>
>> Thanks!
>>
>> [1] 
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/stream/StreamRules.java#L192-L195
>> [2] 
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java#L34
>>
>> On Fri, Jan 8, 2016 at 5:24 PM, Julian Hyde <jh...@apache.org> wrote:
>>> Tables created using “create(RelOptSchema, RelDataType, Table)” have their 
>>> names field set to the empty list. That is, they don’t know their location 
>>> within the schema hierarchy and in fact may not be in the schema hierarchy. 
>>> Usually a table implements itself by generating code like
>>>
>>>  rootSchema.getSubSchema(“FOODMART”).getTable(“CUSTOMERS”)
>>>
>>> But such “free floating” tables cannot implement themselves in that way. 
>>> Therefore this method is only for kinds of tables that know how to get to 
>>> their own data: TranslatableTable, ModifiableTable, ScannableTable.
>>>
>>> Julian
>>>
>>>
>>>> On Jan 7, 2016, at 5:40 PM, Jinfeng Ni <j...@apache.org> wrote:
>>>>
>>>> Does anyone know why one of the static create() methods in
>>>> RelOptTableImpl has the following assertion check (to check table is
>>>> instance of TranslatableTable, or ScannableTable, or ModifiableTable)
>>>> [1], while the rest of create() methods do not do such check? [2]
>>>>
>>>> Looks like RelOptTableImpl.toRel() actually expects table instance
>>>> other than the above three class[3].
>>>>
>>>> Does it makes sense to remove the assertion check in [1]?
>>>>
>>>> Best Regards,
>>>>
>>>> Jinfeng
>>>>
>>>>
>>>> [1]. 
>>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L167-L169
>>>>
>>>> [2] 
>>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L118
>>>>
>>>> [3] 
>>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L225
>>>
>


Re: Assertion check in RelOptTableImpl.create()

2016-01-08 Thread Jinfeng Ni
Thanks for the explanation, Julian.

The places where I want to create an instance of RelOptTable is not in
SqlValidator or SqlToRelConverter. Rather, it's in a RelOptRule, where
CalciteSchema or Path is not available, which makes it not possible to
call the other create() methods. In that sense, such table is "free
floating" table.

The RelOptRule, which essentially is trying to pushing partition
filter into scan, has to create a new instance of RelOptTable, after
certain transformation. In that sense, it's quite similar to
DeltaTableScanRule in Calcite code base [1].  The difference is
DeltaTableScanRule has a ScannableTable, while in Drill, DrillTable
does not implement ScannableTable [2].

Do you think it makes sense to make DrillTable implement ScannableTable?

Thanks!

[1] 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/stream/StreamRules.java#L192-L195
[2] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java#L34

On Fri, Jan 8, 2016 at 5:24 PM, Julian Hyde <jh...@apache.org> wrote:
> Tables created using “create(RelOptSchema, RelDataType, Table)” have their 
> names field set to the empty list. That is, they don’t know their location 
> within the schema hierarchy and in fact may not be in the schema hierarchy. 
> Usually a table implements itself by generating code like
>
>   rootSchema.getSubSchema(“FOODMART”).getTable(“CUSTOMERS”)
>
> But such “free floating” tables cannot implement themselves in that way. 
> Therefore this method is only for kinds of tables that know how to get to 
> their own data: TranslatableTable, ModifiableTable, ScannableTable.
>
> Julian
>
>
>> On Jan 7, 2016, at 5:40 PM, Jinfeng Ni <j...@apache.org> wrote:
>>
>> Does anyone know why one of the static create() methods in
>> RelOptTableImpl has the following assertion check (to check table is
>> instance of TranslatableTable, or ScannableTable, or ModifiableTable)
>> [1], while the rest of create() methods do not do such check? [2]
>>
>> Looks like RelOptTableImpl.toRel() actually expects table instance
>> other than the above three class[3].
>>
>> Does it makes sense to remove the assertion check in [1]?
>>
>> Best Regards,
>>
>> Jinfeng
>>
>>
>> [1]. 
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L167-L169
>>
>> [2] 
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L118
>>
>> [3] 
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L225
>


Re: Loose ends before 1.6

2016-01-07 Thread Jinfeng Ni
Hi Julian,

I'll submit a PR for CALCITE-975, and maybe CALCITE-976 which comes
from Mehant.

Could you please assign CALCITE-978 and CALCITE-977 to Jason
Altekruse? Those patches actually came from him, and for some reason,
I could not assign Calcite JIRA to him.

Thanks,

Jinfeng



On Thu, Jan 7, 2016 at 4:52 PM, Julian Hyde  wrote:
> There are a few pull requests and JIRA cases that have been started but not 
> finished. Can you look for your name below, and give an update on them? Can 
> we get them finished for 1.6?
>
> Pull requests with input needed from author:
>
> * Tom Barber: https://github.com/apache/calcite/pull/162 
> https://issues.apache.org/jira/browse/CALCITE-944
> * Julien https://github.com/apache/calcite/pull/168 
> https://issues.apache.org/jira/browse/CALCITE-991
> * Jesus https://github.com/apache/calcite/pull/171 
> https://issues.apache.org/jira/browse/CALCITE-996
> * Vladimir https://github.com/apache/calcite/pull/176
> * Zinking https://github.com/apache/calcite/pull/178
> * Philip Rhodes https://github.com/apache/calcite/pull/180 
> https://issues.apache.org/jira/browse/CALCITE-1025
>
> JIRA cases:
>
> * me https://issues.apache.org/jira/browse/CALCITE-1046
> * Ted Xu https://issues.apache.org/jira/browse/CALCITE-1043
> * Alexey Makhmutov https://issues.apache.org/jira/browse/CALCITE-1037
> * me https://issues.apache.org/jira/browse/CALCITE-1036
> * Jinfeng https://issues.apache.org/jira/browse/CALCITE-978
> * Jinfeng https://issues.apache.org/jira/browse/CALCITE-977
> * Jinfeng https://issues.apache.org/jira/browse/CALCITE-976
> * Jinfeng https://issues.apache.org/jira/browse/CALCITE-975
> * Kevin Liew https://issues.apache.org/jira/browse/CALCITE-972
> * Julien https://issues.apache.org/jira/browse/CALCITE-957
> * Jacques https://issues.apache.org/jira/browse/CALCITE-950
> * Sean https://issues.apache.org/jira/browse/CALCITE-928
> * Sean https://issues.apache.org/jira/browse/CALCITE-900
> * Jesus https://issues.apache.org/jira/browse/CALCITE-742
>
> Ready to review by any committer:
>
> * https://issues.apache.org/jira/browse/CALCITE-1020
> * https://github.com/apache/calcite/pull/178
>
> Fix on the way:
>
> * https://issues.apache.org/jira/browse/CALCITE-1042
> * https://issues.apache.org/jira/browse/CALCITE-1041
> * https://issues.apache.org/jira/browse/CALCITE-1039
> * https://issues.apache.org/jira/browse/CALCITE-1038
> * https://issues.apache.org/jira/browse/CALCITE-1036
> * https://issues.apache.org/jira/browse/CALCITE-955
> * https://issues.apache.org/jira/browse/CALCITE-915
> * https://issues.apache.org/jira/browse/CALCITE-854
> * https://issues.apache.org/jira/browse/CALCITE-816
> * https://issues.apache.org/jira/browse/CALCITE-794
> * https://issues.apache.org/jira/browse/CALCITE-790
>
> Completed PRs to be closed: 108, 174
>
> Julian
>


Assertion check in RelOptTableImpl.create()

2016-01-07 Thread Jinfeng Ni
Does anyone know why one of the static create() methods in
RelOptTableImpl has the following assertion check (to check table is
instance of TranslatableTable, or ScannableTable, or ModifiableTable)
[1], while the rest of create() methods do not do such check? [2]

Looks like RelOptTableImpl.toRel() actually expects table instance
other than the above three class[3].

Does it makes sense to remove the assertion check in [1]?

Best Regards,

Jinfeng


[1]. 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L167-L169

[2] 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L118

[3] 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L225


Re: [calcite] [CALCITE-966] VolcanoPlanner should clear ruleNames in order to avoid… (#167)

2015-12-01 Thread Jinfeng Ni
Hi Julian,

Thanks for the reminder. I just added the comment per your suggestion to
the JIRA case.

Regards,

Jinfeng


On Tue, Dec 1, 2015 at 1:26 PM, Julian Hyde  wrote:

> Jinfeng,
>
> When you check in a fix to master branch and mark a JIRA case fixed,
> please add a comment to the case:
>
>   Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/.
>   [ Thanks for the ( patch | PR ), ! ]
>
> This is important, because it allows people to easily review the change
> after the event.
>
> Julian
>
>
> Begin forwarded message:
>
> *From: *asfgit 
> *Date: *November 30, 2015 at 3:23:05 PM PST
> *To: *apache/calcite 
> *Cc: *Julian Hyde 
> *Subject: **Re: [calcite] [CALCITE-966] VolcanoPlanner should clear
> ruleNames in order to avoid… (#167)*
>
> Merged #167 .
>
> —
> Reply to this email directly or view it on GitHub
> .
>
>
>


Re: About NULL's position in order by

2015-11-16 Thread Jinfeng Ni
Julian,

I'm trying to see if CALCITE-969/970 may have any impact on Drilll side.

I assume that the following logic happens in Calcite's execution side, right?
" Currently

  ORDER BY x DESC

is equivalent to

  ORDER BY x DESC NULLS LAST
"

On 1.4.0,  for "ORDER BY x DECS", I saw the nullDirection is
"UNSPECIFIED" in LogicalSortRel. I did not see "NULLS LAST" added to
"ORDER BY x DESC" in Calcite logical world.

Drill's execution would interpret "UNSPECIFIED" as "NULL FIRST for
DESC" and "NULL last for ASC" [1].  So, I assume the thing you talked
above is also happening in Calcite's execution time. That's, you will
still keep "UNSPECIFIED" in LogicalSort, if user does not specify
nullDirection, and leave each different execution engine to choose
different interpretation.

Thanks,

Jinfeng


[1]. 
https://github.com/apache/drill/blob/master/logical/src/main/java/org/apache/drill/common/logical/data/Order.java#L195-L202







On Mon, Nov 16, 2015 at 11:50 AM, Julian Hyde  wrote:
>
>
>> On Nov 15, 2015, at 6:50 PM, Li Yang  wrote:
>>
>> https://issues.apache.org/jira/browse/CALCITE-969
>
> Li Yang, Thanks for logging that.
>
> I realized that we have two issues, so I logged 
> https://issues.apache.org/jira/browse/CALCITE-970 for the other part. 970 
> makes NULL values sort HIGH, by default, which is consistent with Oracle.
>
> But it will change some SQL behavior. Currently
>
>   ORDER BY x DESC
>
> is equivalent to
>
>   ORDER BY x DESC NULLS LAST
>
> but after 970 it will be equivalent to
>
>   ORDER BY x DESC NULLS FIRST
>
> If you like the current behavior, you can set defaultNullCollation=LAST. 
> Also, I tightened up what happens in RelNode land; we now use 
> NullDirection.UNSPECIFIED a lot less than we used to.
>
> Please holler if you don’t like 970.
>
> Julian
>