[jira] [Created] (CALCITE-6338) RelMdCollation#project can return an incomplete list of collations

Ruben Q L (Jira) Thu, 21 Mar 2024 04:45:48 -0700

Ruben Q L created CALCITE-6338:
----------------------------------

             Summary: RelMdCollation#project can return an incomplete list of 
collations
                 Key: CALCITE-6338
                 URL: https://issues.apache.org/jira/browse/CALCITE-6338
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.36.0
            Reporter: Ruben Q L
            Assignee: Ruben Q L



{{RelMdCollation#project}} can return an incomplete list of collations.

(I'll try to produce a unit test, for now I'll just describe the situation)

Let us say we have a Project that projects the following expressions (notice 
that $2 will become $1 and $2 after the projection): $0, $2, $2, $3
The Project's input has collation [2, 3]
In order to calculate the Project's own collation, {{RelMdCollation#project}} 
will be called, and a MultiMap targets will be computed because, as in this 
case, a certain "source field" (e.g. 2) can have multiple project targets (e.g. 
1 and 2). However, when the collation is being computed, *only the first target 
will be considered* (and the rest will be discarded):
{code}
  public static @Nullable List<RelCollation> project(RelMetadataQuery mq,
      RelNode input, List<? extends RexNode> projects) {
  ...
      for (RelFieldCollation ifc : ic.getFieldCollations()) {
        final Collection<Integer> integers = targets.get(ifc.getFieldIndex());
        if (integers.isEmpty()) {
          continue loop; // cannot do this collation
        }
        fieldCollations.add(ifc.withFieldIndex(integers.iterator().next()));  
// <-- HERE!!
      }
{code}
Because of this, the Project's collation will be [1 3], but there is also 
another valid one ([2 3]), so the correct (complete) result should be: [1 3] [2 
3]

This seems a minor problem, but it can be the root cause of more relevant 
issues. For instance, at the moment I have a scenario (not so easy to reproduce 
with a unit test) where a certain plan with a certain combination of rules in a 
HepPlanner results in a StackOverflow due to SortJoinTransposeRule being fired 
infinitely. The root cause is that, after the first application, the rule does 
not detect that the Join's left input is already sorted (due to the previous 
application of the rule), because there is a "problematic" Project on it (that 
shows the problem described above), which returns only one collation, whereas 
the second collation (the one being discarded) is the Sort's collation, so it 
would be one that would prevent the SortJoinTransposeRule from being re-applied 
over and over.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CALCITE-6338) RelMdCollation#project can return an incomplete list of collations

Reply via email to