Re: how does SortRemoveRule work?

2019-01-03 Thread Vladimir Sitnikov
Jess> Is my test incorrect? This also fails:

The test looks like a correct one, however its correctness depends on
the implementation of runDuplicateSortCheck.
It turns out the implementation was wrong, so the top-level sort was
removed not by SortRemoveRule, but it was removed by a mere fact that
the test
dd ask Planner to produce a result with empty collation.

I've fixed that in https://issues.apache.org/jira/browse/CALCITE-2768,
and I've added your test there, so top-level order by is not removed:
https://gitbox.apache.org/repos/asf?p=calcite.git;a=commitdiff;h=b54f6de9d7f87e9853fc9ec01b586555a089b913

Vladimir


Re: how does SortRemoveRule work?

2016-10-14 Thread Julian Hyde
If the sort is trivial (i.e. its input is already sorted) then the outcome of 
calling call.transformTo(…) will be to put the sort and its input into the same 
subset. Thus the input could be used as a (cheaper) alternative.

If the sort is not trivial, the effect of the rule will be put the sort and its 
input in different RelSubsets of the same RelSet. They were probably in 
different RelSets before the rule was called, in which case it will trigger a 
merge of those sets: the planner now knows that any relational expression in 
either of those sets is now equivalent to any other.

> On Oct 14, 2016, at 3:34 PM, Jess Balint  wrote:
> 
> On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde  wrote:
> 
>> Note the important expression:
>> 
>>  convert(sort.getInput(), traits)
>> 
>> This creates a new relational expression (or returns an existing one) that
>> is “input" sorted by the required fields. It doesn’t change “input”.
>> 
>> 
> I dug into getOrCreateSubset/addAbstractConverters but I still don't see
> how this could work.
> 
> 
>> By the way, the Volcano paper talks about physical properties (what we
>> call traits) and enforcers. An enforcer is a relational operator that
>> doesn’t change the logical content of the data, just changes its physical
>> properties.
>> 
>> 
> Isn't the sort node acting as the enforcer here? I don't see a reason to
> remove it and then add it back via convert(). I'm specifically looking at
> the case where the sort is necessary (i.e. should not be removed by this
> optimization).
> 
> 
>> Every physical property has a corresponding enforcer: Sort is the enforcer
>> of the Collation trait; Shuffle is the enforcer of the Distribution trait;
>> Convert is the enforcer of the CallingConvention trait.
>> 
>> The short piece of code you quoted may seem almost trivial, but it is
>> stating something profound: the relationship between Sort and Collation.
>> 
>> The trick, of course, is to find a way of achieving a sort order without
>> using a sort (e.g. finding a copy of the table that contains the columns
>> you want and is sorted in the order you want, i.e. an index). The whole
>> reason we have the concept of physical properties is to make that kind of
>> optimization (sometimes) possible.
>> 
>> 
> I created a simple test:
> 
>  @Test public void testTwoSortDontRemove() throws Exception {
>runDuplicateSortCheck("select empid+deptno from ( "
>  + "select empid, deptno "
>  + "from emps "
>  + "order by empid) "
>  + "order by deptno",
>  "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
>  + "  EnumerableProject(EXPR$0=[+($0, $1)],
> deptno=[$1])\n"
>  + "EnumerableSort(sort0=[$0], dir0=[ASC])\n"
>  + "  EnumerableProject(empid=[$0],
> deptno=[$1])\n"
>  + "EnumerableTableScan(table=[[hr,
> emps]])\n");
>  }
> 
> When I run this, I get an error because the sort is removed:
> 
> EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
>  EnumerableSort(sort0=[$0], dir0=[ASC])
>EnumerableProject(empid=[$0], deptno=[$1])
>  EnumerableTableScan(table=[[hr, emps]])
> 
> I changed the original method in question to only remove the sort if the
> requested collation matches that of it's input:
> 
>if
> (sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE).equals(collation))
> {
>  final RelTraitSet traits =
> sort.getInput().getTraitSet().replace(collation);
>  call.transformTo(convert(sort.getInput(), traits));
>}
> 
> This will cause the test to pass. Isn't this a bug?
> 
> Jess
> 
> 
>> Julian
>> 
>> 
>>> On Oct 14, 2016, at 2:33 PM, Jess Balint  wrote:
>>> 
>>> Just doing a bit of code reading here. Looking at the SortRemoveRule
>>> implementation:
>>> 
>>>   // Express the "sortedness" requirement in terms of a collation trait
>>> and
>>>   // we can get rid of the sort. This allows us to use rels that just
>>> happen
>>>   // to be sorted but get the same effect.
>>>   final RelCollation collation = sort.getCollation();
>>>   assert collation == sort.getTraitSet()
>>>   .getTrait(RelCollationTraitDef.INSTANCE);
>>>   final RelTraitSet traits =
>>> sort.getInput().getTraitSet().replace(collation);
>>>   call.transformTo(convert(sort.getInput(), traits));
>>> 
>>> I don't understand how this is correct without checking that the input
>>> collation is the same as the collation requested by the sort. It looks to
>>> me like it just replaces the input's collation with that of the sort. Any
>>> thoughts?
>>> 
>>> Thanks.
>>> 
>>> Jess
>> 
>> 



Re: how does SortRemoveRule work?

2016-10-14 Thread Julian Hyde
Note the important expression:

  convert(sort.getInput(), traits)

This creates a new relational expression (or returns an existing one) that is 
“input" sorted by the required fields. It doesn’t change “input”.

By the way, the Volcano paper talks about physical properties (what we call 
traits) and enforcers. An enforcer is a relational operator that doesn’t change 
the logical content of the data, just changes its physical properties.

Every physical property has a corresponding enforcer: Sort is the enforcer of 
the Collation trait; Shuffle is the enforcer of the Distribution trait; Convert 
is the enforcer of the CallingConvention trait.

The short piece of code you quoted may seem almost trivial, but it is stating 
something profound: the relationship between Sort and Collation.

The trick, of course, is to find a way of achieving a sort order without using 
a sort (e.g. finding a copy of the table that contains the columns you want and 
is sorted in the order you want, i.e. an index). The whole reason we have the 
concept of physical properties is to make that kind of optimization (sometimes) 
possible.

Julian


> On Oct 14, 2016, at 2:33 PM, Jess Balint  wrote:
> 
> Just doing a bit of code reading here. Looking at the SortRemoveRule
> implementation:
> 
>// Express the "sortedness" requirement in terms of a collation trait
> and
>// we can get rid of the sort. This allows us to use rels that just
> happen
>// to be sorted but get the same effect.
>final RelCollation collation = sort.getCollation();
>assert collation == sort.getTraitSet()
>.getTrait(RelCollationTraitDef.INSTANCE);
>final RelTraitSet traits =
> sort.getInput().getTraitSet().replace(collation);
>call.transformTo(convert(sort.getInput(), traits));
> 
> I don't understand how this is correct without checking that the input
> collation is the same as the collation requested by the sort. It looks to
> me like it just replaces the input's collation with that of the sort. Any
> thoughts?
> 
> Thanks.
> 
> Jess