cloud-fan commented on code in PR #56575:
URL: https://github.com/apache/spark/pull/56575#discussion_r3457473894
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala:
##########
@@ -376,6 +376,11 @@ class V2ExpressionBuilder(e: Expression, isPredicate:
Boolean = false) extends L
} else {
None
}
+ // A surviving `RuntimeReplaceable` (`eagerReplace = false`) that no
explicit case above pushes
+ // natively is a Spark-internal optimizer node the connector cannot
understand. Fall back to its
+ // concrete `replacement` so the lowered form can still be pushed -- same
boundary rationale as
+ // `DataSourceStrategy.translateLeafNodeFilter` (V1) and
`CachedBatchSerializer.buildFilter`.
+ case r: RuntimeReplaceable => generateExpression(r.replacement,
isPredicate)
Review Comment:
Good catch, fixed in cafddb8 with the "prefer an exact map entry before
descending" option.
`rebuildExpressionFromFilter` now checks
`translatedFilterToExpr.get(predicate)` before matching `V2And`/`V2Or`/`V2Not`.
For the surviving wrapper, the compound `V2And` *is* a map key (mapped to the
wrapper), so it's restored directly and we never descend into the synthetic
children that have no entries. The original readable wrapper is kept in the
post-scan filter and materialized into its `replacement` before codegen by
`MaterializeRuntimeReplaceable`.
This is granularity-correct at every level of descent, so nesting is covered
too -- e.g. `And(c = 5, wrapper(And(a > 1, b < 2)))` rebuilds as `And(c = 5,
wrapper)`: the outer `V2And` isn't a map key so it descends, and the inner
`V2And(a>1, b<2)` is a map key so it's restored before its synthetic children
are visited. Normal compound filters are decomposed at translation (only leaves
mapped), so the exact lookup misses and falls through to the existing descent
-- behavior unchanged.
Added a regression test in `DataSourceV2StrategySuite` that translates a
compound-replacement wrapper via `translateFilterV2WithMapping`, then feeds the
result back through `rebuildExpressionFromFilter` (equivalent to a rejecting
`SupportsPushDownV2Filters` source returning the predicate unchanged) and
asserts the wrapper is restored; it throws `Failed to rebuild Expression for
filter` without the fix.
One note on V1: the V1 path (`DataSourceStrategy`) does not have this crash.
There the unfold sits inside `translateLeafNodeFilter`, which only recognizes
leaf shapes, so a compound replacement simply isn't translated (returns `None`)
-- no pushdown, no map entry, correct results. That's a milder
missed-optimization version of the same root cause; I left V1 as-is since
fixing it would be an enhancement (and would lose the readable wrapper in the
post-scan filter).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]