Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3663435980 Thanks. I ran locally the test that failed in CI and it passed without errors. That's indeed strange. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
xiedeyantu commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3663420235 It's a bit strange that the main branch Jenkins CI failed. I have rerun jenkins job. link is [here](https://ci-builds.apache.org/job/Calcite/job/Calcite-sonar/job/main/1259/cloudbees-pipeline-explorer/). Please wait. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong merged PR #4678: URL: https://github.com/apache/calcite/pull/4678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3658672250 If there are no other comments, I will merge this PR after 24 hours. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
sonarqubecloud[bot] commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3650183912 ## [](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) **Quality Gate passed** Issues  [6 New issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [0 Accepted issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=ACCEPTED) Measures  [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [86.3% Coverage on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_coverage&view=list)  [0.0% Duplication on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_duplicated_lines_density&view=list) [See analysis details on SonarQube Cloud](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
mihaibudiu commented on code in PR #4678: URL: https://github.com/apache/calcite/pull/4678#discussion_r2616462473 ## core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java: ## @@ -218,8 +218,10 @@ private Result implementHashSemiJoin(EnumerableRelImplementor implementor, Prefe Expressions.list( leftExpression, rightExpression, - leftResult.physType.generateAccessorWithoutNulls(joinInfo.leftKeys), - rightResult.physType.generateAccessorWithoutNulls(joinInfo.rightKeys), +leftResult.physType.generateNullAwareAccessor( Review Comment: I was hoping your change would solve it, but if it doesn't it means it's a bug somewhere else. Maybe there is even an issue for it. I don't think it's type inference in the validator, it must be something wrong later, in the conversion to enumerable, as you describe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
sonarqubecloud[bot] commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3649642444 ## [](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) **Quality Gate passed** Issues  [6 New issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [0 Accepted issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=ACCEPTED) Measures  [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [86.3% Coverage on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_coverage&view=list)  [0.0% Duplication on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_duplicated_lines_density&view=list) [See analysis details on SonarQube Cloud](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
xiedeyantu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2616324801
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
What do you think of this name `nullExclusionFlags`?
> Before this commit, IS NOT DISTINCT FROM was part of nonEquiConditions,
but in this commit IS NOT DISTINCT FROM can be treated as a hash join key and
is no longer part of nonEquiConditions.
I think this is a good comment that can be placed in the code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
xiedeyantu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2616324801
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
What do you think of this name?
```
// For each join key, whether to use null-safe equality.
// TRUE: IS NOT DISTINCT FROM semantics (NULLs match NULLs)
// FALSE: standard = semantics (NULL != NULL)
public final ImmutableList nullSafeKeys;
```
> Before this commit, IS NOT DISTINCT FROM was part of nonEquiConditions,
but in this commit IS NOT DISTINCT FROM can be treated as a hash join key and
is no longer part of nonEquiConditions.
I think this is a good comment that can be placed in the code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
sonarqubecloud[bot] commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3649216554 ## [](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) **Quality Gate passed** Issues  [6 New issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [0 Accepted issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=ACCEPTED) Measures  [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [87.3% Coverage on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_coverage&view=list)  [0.0% Duplication on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_duplicated_lines_density&view=list) [See analysis details on SonarQube Cloud](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2616223855
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
`filterNulls` is populated in `RelOptUtil.splitJoinCondition`. If it is
TRUE, it represents EQUALS semantics (i.e., not null-safe). If it were renamed,
it would be `notNullAwareKeyFlags` or `notNullSafeKeyFlags`, which seems a bit
complicated to read. Would you be okay with keeping the current name?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678: URL: https://github.com/apache/calcite/pull/4678#discussion_r2616190489 ## core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java: ## @@ -218,8 +218,10 @@ private Result implementHashSemiJoin(EnumerableRelImplementor implementor, Prefe Expressions.list( leftExpression, rightExpression, - leftResult.physType.generateAccessorWithoutNulls(joinInfo.leftKeys), - rightResult.physType.generateAccessorWithoutNulls(joinInfo.rightKeys), +leftResult.physType.generateNullAwareAccessor( Review Comment: The "RuntimeException: Cannot convert null to long" error in TpcdsTest may be related to type inference. When building the join-key selector, the `PhysTypeImpl.fieldReference(Expression expression, int field)` is called, which generates the code that extracts a specific column from a row (just like `row[i]`). For example, if the i-th column is of type Long and is inferred as non-nullable, the generated code is `toLong(row[i])` (and `toLong` does not accept null); if it is inferred to be nullable, the generated code is just `row[i]`. In the failing test case, the join key is inferred as non-nullable, so the generated code is `toLong(row[i])`, but at runtime there is a row whose join key is actually null, causing `toLong` to receive a null and throw the exception. I haven't dug into this deeply yet, so I'm not sure whether this is a bug in the type inference or an unexpected null appearing during execution. Anyway, the observed behavior is as described above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
mihaibudiu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2612934425
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysType.java:
##
@@ -134,6 +134,13 @@ Expression fieldReference(Expression expression, int field,
*/
Expression generateAccessorWithoutNulls(List fields);
+ /**
+ * Similar to {@link #generateAccessor(List)} and {@link
#generateAccessorWithoutNulls(List)},
+ * but it's null-aware. If one of the fields is null and this
field is marked as
+ * TRUE in filterNulls, it will return null.
Review Comment:
I think this is a much better explanation for the JavaDoc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2612662507
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysType.java:
##
@@ -134,6 +134,13 @@ Expression fieldReference(Expression expression, int field,
*/
Expression generateAccessorWithoutNulls(List fields);
+ /**
+ * Similar to {@link #generateAccessor(List)} and {@link
#generateAccessorWithoutNulls(List)},
+ * but it's null-aware. If one of the fields is null and this
field is marked as
+ * TRUE in filterNulls, it will return null.
Review Comment:
The function returns a non-null Expression (specifically a
FunctionExpression). This Expression will be compiled into a Java Function1
object, which evaluates to Java null (or a Object[] of keys that may contain
Java null) at runtime, as described above.
Does this clarify your question?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2612662507
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysType.java:
##
@@ -134,6 +134,13 @@ Expression fieldReference(Expression expression, int field,
*/
Expression generateAccessorWithoutNulls(List fields);
+ /**
+ * Similar to {@link #generateAccessor(List)} and {@link
#generateAccessorWithoutNulls(List)},
+ * but it's null-aware. If one of the fields is null and this
field is marked as
+ * TRUE in filterNulls, it will return null.
Review Comment:
The function returns a non-null Expression (specifically a
FunctionExpression). It is compiled into a Java Function1 object, which
evaluates to Java null (or a Object[] of keys that may contain Java null) at
runtime, as described above.
Does this clarify your question?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
mihaibudiu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2611574599
##
core/src/test/resources/sql/planner.iq:
##
@@ -215,14 +215,13 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
!ok
EnumerableAggregate(group=[{0}])
- EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
+ EnumerableHashJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
- EnumerableAggregate(group=[{0}])
-EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
-EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
5.0 }]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
-EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+ EnumerableHashJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
Review Comment:
Nice, thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
mihaibudiu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2611502547
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysType.java:
##
@@ -134,6 +134,13 @@ Expression fieldReference(Expression expression, int field,
*/
Expression generateAccessorWithoutNulls(List fields);
+ /**
+ * Similar to {@link #generateAccessor(List)} and {@link
#generateAccessorWithoutNulls(List)},
+ * but it's null-aware. If one of the fields is null and this
field is marked as
+ * TRUE in filterNulls, it will return null.
Review Comment:
Well, the return result of this function is Expression. So is the result a
Java null or an Expression that evaluates to null, or something else?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2611295656
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
`filterNulls` indicates whether each join key is null-safe. If
`filterNulls.get(i)` is FALSE, the i-th key is null-safe (i.e. IS NOT DISTINCT
FROM). I will add a comment and rename it.
> If I remember correctly, nonEquiConditions should store non-equality
conditions, and IS NOT DISTINCT FROM is also considered a non-equality
condition for nonEquiConditions.
I'm not sure I understood your question correctly. Before this commit, IS
NOT DISTINCT FROM was part of `nonEquiConditions`, but in this commit IS NOT
DISTINCT FROM can be treated as a hash join key and is no longer part of
`nonEquiConditions`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
xiedeyantu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2610652049
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysTypeImpl.java:
##
@@ -642,6 +642,22 @@ private List fieldReferences(
}
}
+ private static Expression getListExpressionAllowSingleElement(
+ Expressions.FluentList list) {
+assert list.size() > 0;
+
+switch (list.size()) {
Review Comment:
Maybe an if-else would be better?
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
If I remember correctly, `nonEquiConditions` should store non-equality
conditions, and `IS NOT DISTINCT FROM` is also considered a non-equality
condition for `nonEquiConditions`.
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
If I understand correctly, would `nullAwareKeyFlags` be a better name? It
definitely needs additional documentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2610186506
##
core/src/test/resources/sql/planner.iq:
##
@@ -215,14 +215,13 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
!ok
EnumerableAggregate(group=[{0}])
- EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
+ EnumerableHashJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
- EnumerableAggregate(group=[{0}])
-EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
-EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
5.0 }]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
-EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+ EnumerableHashJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
Review Comment:
During the optimization in Volcano, UnionMergeRule merges two Intersects
into one Intersect (three inputs). The two candidate plans are then applied to
IntersectToSemiJoinRule, resulting in:
```
// plan 1 -- two Intersect are respectively applied to the
IntersectToSemiJoinRule
Agg
Join(condition=IS NOT DISTINCT FROM)
Agg
Join(condition=IS NOT DISTINCT FROM)
Input2
Input3
Input1
// plan 2 -- one Intersect with three inputs is applied to
IntersectToSemiJoinRule
Agg
Join(condition=IS NOT DISTINCT FROM)
Join(condition=IS NOT DISTINCT FROM)
Input2
Input3
Input1
```
before this commit, all the joins above were NLJoin, with plan 1 having a
lower cost;
after this commit, all the joins above are HashJoin, with plan 2 having a
lower cost.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
silundong commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2609567560
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysType.java:
##
@@ -134,6 +134,13 @@ Expression fieldReference(Expression expression, int field,
*/
Expression generateAccessorWithoutNulls(List fields);
+ /**
+ * Similar to {@link #generateAccessor(List)} and {@link
#generateAccessorWithoutNulls(List)},
+ * but it's null-aware. If one of the fields is null and this
field is marked as
+ * TRUE in filterNulls, it will return null.
Review Comment:
Taking the join keys selector as an example:
fields: `[x, y, z]`,
join keys: `[x, y]`, `y` is null-safe,
if the row is `[NULL,10, 100]`, it will return `NULL`;
if the row is `[2, NULL, 200]`, it will return `[2, NULL]`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
mihaibudiu commented on code in PR #4678:
URL: https://github.com/apache/calcite/pull/4678#discussion_r2609496880
##
core/src/test/resources/sql/planner.iq:
##
@@ -215,14 +215,13 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
!ok
EnumerableAggregate(group=[{0}])
- EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
+ EnumerableHashJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[semi])
EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)],
A=[$t1])
- EnumerableAggregate(group=[{0}])
-EnumerableNestedLoopJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
-EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, {
5.0 }]])
- EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)
NOT NULL], A=[$t1])
-EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+ EnumerableHashJoin(condition=[IS NOT DISTINCT FROM($0, $1)],
joinType=[semi])
Review Comment:
do you know why the distinct is no longer necessary?
##
core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableHashJoin.java:
##
@@ -218,8 +218,10 @@ private Result
implementHashSemiJoin(EnumerableRelImplementor implementor, Prefe
Expressions.list(
leftExpression,
rightExpression,
-
leftResult.physType.generateAccessorWithoutNulls(joinInfo.leftKeys),
-
rightResult.physType.generateAccessorWithoutNulls(joinInfo.rightKeys),
+leftResult.physType.generateNullAwareAccessor(
Review Comment:
There are some tests in TpcdsTest which are marked
```
@Disabled("throws 'RuntimeException: Cannot convert null to long'")
```
which I think boils down to a problem with the HashJoin. Can you check to
see whether these tests can be enabled?
They are marked `@slow`, so you may need to comment-out temporarily that
annotation to be able to run them.
(I have no idea when the slow tests are run, if ever)
##
core/src/main/java/org/apache/calcite/rel/core/JoinInfo.java:
##
@@ -29,33 +29,37 @@
import com.google.common.collect.ImmutableList;
import java.util.ArrayList;
+import java.util.Collections;
import java.util.List;
import static java.util.Objects.requireNonNull;
/** An analyzed join condition.
*
* It is useful for the many algorithms that care whether a join has an
- * equi-join condition.
+ * equi-join condition (contains EQUALS and IS NOT DISTINCT FROM).
*
- * You can create one using {@link #createWithStrictEquality}, or call
+ * You can create one using {@link #of(RelNode, RelNode, RexNode)},
+ * {@link #createWithStrictEquality}, or call
* {@link Join#analyzeCondition()}; many kinds of join cache their
* join info, especially those that are equi-joins.
*
* @see Join#analyzeCondition() */
public class JoinInfo {
public final ImmutableIntList leftKeys;
public final ImmutableIntList rightKeys;
+ public final ImmutableList filterNulls;
Review Comment:
Can you please add JavaDoc for this field and perhaps the next one too?
##
core/src/main/java/org/apache/calcite/adapter/enumerable/PhysType.java:
##
@@ -134,6 +134,13 @@ Expression fieldReference(Expression expression, int field,
*/
Expression generateAccessorWithoutNulls(List fields);
+ /**
+ * Similar to {@link #generateAccessor(List)} and {@link
#generateAccessorWithoutNulls(List)},
+ * but it's null-aware. If one of the fields is null and this
field is marked as
+ * TRUE in filterNulls, it will return null.
Review Comment:
You mean, it will return an expression that evaluates to null?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [CALCITE-7327] Support IS NOT DISTINCT FROM as equi condition of hash join [calcite]
sonarqubecloud[bot] commented on PR #4678: URL: https://github.com/apache/calcite/pull/4678#issuecomment-3640595542 ## [](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) **Quality Gate passed** Issues  [7 New issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [0 Accepted issues](https://sonarcloud.io/project/issues?id=apache_calcite&pullRequest=4678&issueStatuses=ACCEPTED) Measures  [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_calcite&pullRequest=4678&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true)  [87.3% Coverage on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_coverage&view=list)  [0.0% Duplication on New Code](https://sonarcloud.io/component_measures?id=apache_calcite&pullRequest=4678&metric=new_duplicated_lines_density&view=list) [See analysis details on SonarQube Cloud](https://sonarcloud.io/dashboard?id=apache_calcite&pullRequest=4678) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
