[ https://issues.apache.org/jira/browse/SPARK-46671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Asif resolved SPARK-46671. -------------------------- Resolution: Not A Bug > InferFiltersFromConstraint rule is creating a redundant filter > -------------------------------------------------------------- > > Key: SPARK-46671 > URL: https://issues.apache.org/jira/browse/SPARK-46671 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.5.0 > Reporter: Asif > Priority: Minor > Labels: SQL, catalyst > > while bring my old PR which uses a different approach to the > ConstraintPropagation algorithm ( > [SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch > with current master, I noticed a test failure in my branch for SPARK-33152: > The test which is failing is > InferFiltersFromConstraintSuite: > {code} > test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: > Infer Filters") { > val x = testRelation.as("x") > val y = testRelation.as("y") > val z = testRelation.as("z") > // Removes EqualNullSafe when constructing candidate constraints > comparePlans( > InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa")) > .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze), > x.select($"x.a", $"x.a".as("xa")) > .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && > $"xa" === $"x.a").analyze) > // Once strategy's idempotence is not broken > val originalQuery = > x.join(y, condition = Some($"x.a" === $"y.a")) > .select($"x.a", $"x.a".as("xa")).as("xy") > .join(z, condition = Some($"xy.a" === $"z.a")).analyze > val correctAnswer = > x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = > Some($"x.a" === $"y.a")) > .select($"x.a", $"x.a".as("xa")).as("xy") > .join(z.where($"a".isNotNull), condition = Some($"xy.a" === > $"z.a")).analyze > val optimizedQuery = InferFiltersFromConstraints(originalQuery) > comparePlans(optimizedQuery, correctAnswer) > comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer) > } > {code} > In the above test, I believe the below assertion is not proper. > There is a redundant filter which is getting created. > Out of these two isNotNull constraints, only one should be created. > $"xa".isNotNull && $"x.a".isNotNull > Because presence of (xa#0 = a#0), automatically implies that is one > attribute is not null, the other also has to be not null. > // Removes EqualNullSafe when constructing candidate constraints > comparePlans( > InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa")) > .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze), > x.select($"x.a", $"x.a".as("xa")) > .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && > $"xa" === $"x.a").analyze) > This is not a big issue, but it highlights the need to take a relook at the > code of ConstraintPropagation and related code. > I am filing this jira so that constraint code can be tightened/made more > robust. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org