If the sort is trivial (i.e. its input is already sorted) then the outcome of
calling call.transformTo(…) will be to put the sort and its input into the same
subset. Thus the input could be used as a (cheaper) alternative.
If the sort is not trivial, the effect of the rule will be put the sort and its
input in different RelSubsets of the same RelSet. They were probably in
different RelSets before the rule was called, in which case it will trigger a
merge of those sets: the planner now knows that any relational expression in
either of those sets is now equivalent to any other.
> On Oct 14, 2016, at 3:34 PM, Jess Balint wrote:
>
> On Fri, Oct 14, 2016 at 5:16 PM, Julian Hyde wrote:
>
>> Note the important expression:
>>
>> convert(sort.getInput(), traits)
>>
>> This creates a new relational expression (or returns an existing one) that
>> is “input" sorted by the required fields. It doesn’t change “input”.
>>
>>
> I dug into getOrCreateSubset/addAbstractConverters but I still don't see
> how this could work.
>
>
>> By the way, the Volcano paper talks about physical properties (what we
>> call traits) and enforcers. An enforcer is a relational operator that
>> doesn’t change the logical content of the data, just changes its physical
>> properties.
>>
>>
> Isn't the sort node acting as the enforcer here? I don't see a reason to
> remove it and then add it back via convert(). I'm specifically looking at
> the case where the sort is necessary (i.e. should not be removed by this
> optimization).
>
>
>> Every physical property has a corresponding enforcer: Sort is the enforcer
>> of the Collation trait; Shuffle is the enforcer of the Distribution trait;
>> Convert is the enforcer of the CallingConvention trait.
>>
>> The short piece of code you quoted may seem almost trivial, but it is
>> stating something profound: the relationship between Sort and Collation.
>>
>> The trick, of course, is to find a way of achieving a sort order without
>> using a sort (e.g. finding a copy of the table that contains the columns
>> you want and is sorted in the order you want, i.e. an index). The whole
>> reason we have the concept of physical properties is to make that kind of
>> optimization (sometimes) possible.
>>
>>
> I created a simple test:
>
> @Test public void testTwoSortDontRemove() throws Exception {
>runDuplicateSortCheck("select empid+deptno from ( "
> + "select empid, deptno "
> + "from emps "
> + "order by empid) "
> + "order by deptno",
> "EnumerableSort(sort0=[$1], dir0=[ASC])\n"
> + " EnumerableProject(EXPR$0=[+($0, $1)],
> deptno=[$1])\n"
> + "EnumerableSort(sort0=[$0], dir0=[ASC])\n"
> + " EnumerableProject(empid=[$0],
> deptno=[$1])\n"
> + "EnumerableTableScan(table=[[hr,
> emps]])\n");
> }
>
> When I run this, I get an error because the sort is removed:
>
> EnumerableProject(EXPR$0=[+($0, $1)], deptno=[$1])
> EnumerableSort(sort0=[$0], dir0=[ASC])
>EnumerableProject(empid=[$0], deptno=[$1])
> EnumerableTableScan(table=[[hr, emps]])
>
> I changed the original method in question to only remove the sort if the
> requested collation matches that of it's input:
>
>if
> (sort.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE).equals(collation))
> {
> final RelTraitSet traits =
> sort.getInput().getTraitSet().replace(collation);
> call.transformTo(convert(sort.getInput(), traits));
>}
>
> This will cause the test to pass. Isn't this a bug?
>
> Jess
>
>
>> Julian
>>
>>
>>> On Oct 14, 2016, at 2:33 PM, Jess Balint wrote:
>>>
>>> Just doing a bit of code reading here. Looking at the SortRemoveRule
>>> implementation:
>>>
>>> // Express the "sortedness" requirement in terms of a collation trait
>>> and
>>> // we can get rid of the sort. This allows us to use rels that just
>>> happen
>>> // to be sorted but get the same effect.
>>> final RelCollation collation = sort.getCollation();
>>> assert collation == sort.getTraitSet()
>>> .getTrait(RelCollationTraitDef.INSTANCE);
>>> final RelTraitSet traits =
>>> sort.getInput().getTraitSet().replace(collation);
>>> call.transformTo(convert(sort.getInput(), traits));
>>>
>>> I don't understand how this is correct without checking that the input
>>> collation is the same as the collation requested by the sort. It looks to
>>> me like it just replaces the input's collation with that of the sort. Any
>>> thoughts?
>>>
>>> Thanks.
>>>
>>> Jess
>>
>>