Asif created SPARK-46671: ---------------------------- Summary: InferFiltersFromConstraint rule is creating a redundant filter Key: SPARK-46671 URL: https://issues.apache.org/jira/browse/SPARK-46671 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Asif
while bring my old PR which uses a different approach to the ConstraintPropagation algorithm ( [SPARK-33152|https://issues.apache.org/jira/browse/SPARK-33152]) in synch with current master, I noticed a test failure in my branch for SPARK-33152: The test which is failing is InferFiltersFromConstraintSuite: {code} test("SPARK-43095: Avoid Once strategy's idempotence is broken for batch: Infer Filters") { val x = testRelation.as("x") val y = testRelation.as("y") val z = testRelation.as("z") // Removes EqualNullSafe when constructing candidate constraints comparePlans( InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa")) .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze), x.select($"x.a", $"x.a".as("xa")) .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" === $"x.a").analyze) // Once strategy's idempotence is not broken val originalQuery = x.join(y, condition = Some($"x.a" === $"y.a")) .select($"x.a", $"x.a".as("xa")).as("xy") .join(z, condition = Some($"xy.a" === $"z.a")).analyze val correctAnswer = x.where($"a".isNotNull).join(y.where($"a".isNotNull), condition = Some($"x.a" === $"y.a")) .select($"x.a", $"x.a".as("xa")).as("xy") .join(z.where($"a".isNotNull), condition = Some($"xy.a" === $"z.a")).analyze val optimizedQuery = InferFiltersFromConstraints(originalQuery) comparePlans(optimizedQuery, correctAnswer) comparePlans(InferFiltersFromConstraints(optimizedQuery), correctAnswer) } {code} In the above test, I believe the below assertion is not proper. There is a redundant filter which is getting created. Out of these two isNotNull constraints, only one should be created. $"xa".isNotNull && $"x.a".isNotNull Because presence of (xa#0 = a#0), automatically implies that is one attribute is not null, the other also has to be not null. // Removes EqualNullSafe when constructing candidate constraints comparePlans( InferFiltersFromConstraints(x.select($"x.a", $"x.a".as("xa")) .where($"xa" <=> $"x.a" && $"xa" === $"x.a").analyze), x.select($"x.a", $"x.a".as("xa")) .where($"xa".isNotNull && $"x.a".isNotNull && $"xa" <=> $"x.a" && $"xa" === $"x.a").analyze) This is not a big issue, but it highlights the need to take a relook at the code of ConstraintPropagation and related code. I am filing this jira so that constraint code can be tightened/made more robust. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org