cloud-fan commented on code in PR #56575:
URL: https://github.com/apache/spark/pull/56575#discussion_r3457473894


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala:
##########
@@ -376,6 +376,11 @@ class V2ExpressionBuilder(e: Expression, isPredicate: 
Boolean = false) extends L
       } else {
         None
       }
+    // A surviving `RuntimeReplaceable` (`eagerReplace = false`) that no 
explicit case above pushes
+    // natively is a Spark-internal optimizer node the connector cannot 
understand. Fall back to its
+    // concrete `replacement` so the lowered form can still be pushed -- same 
boundary rationale as
+    // `DataSourceStrategy.translateLeafNodeFilter` (V1) and 
`CachedBatchSerializer.buildFilter`.
+    case r: RuntimeReplaceable => generateExpression(r.replacement, 
isPredicate)

Review Comment:
   Good catch, fixed in cafddb8 with the "prefer an exact map entry before 
descending" option.
   
   `rebuildExpressionFromFilter` now checks 
`translatedFilterToExpr.get(predicate)` before matching `V2And`/`V2Or`/`V2Not`. 
For the surviving wrapper, the compound `V2And` *is* a map key (mapped to the 
wrapper), so it's restored directly and we never descend into the synthetic 
children that have no entries. The original readable wrapper is kept in the 
post-scan filter and materialized into its `replacement` before codegen by 
`MaterializeRuntimeReplaceable`.
   
   This is granularity-correct at every level of descent, so nesting is covered 
too -- e.g. `And(c = 5, wrapper(And(a > 1, b < 2)))` rebuilds as `And(c = 5, 
wrapper)`: the outer `V2And` isn't a map key so it descends, and the inner 
`V2And(a>1, b<2)` is a map key so it's restored before its synthetic children 
are visited. Normal compound filters are decomposed at translation (only leaves 
mapped), so the exact lookup misses and falls through to the existing descent 
-- behavior unchanged.
   
   Added a regression test in `DataSourceV2StrategySuite` that translates a 
compound-replacement wrapper via `translateFilterV2WithMapping`, then feeds the 
result back through `rebuildExpressionFromFilter` (equivalent to a rejecting 
`SupportsPushDownV2Filters` source returning the predicate unchanged) and 
asserts the wrapper is restored; it throws `Failed to rebuild Expression for 
filter` without the fix.
   
   One note on V1: the V1 path (`DataSourceStrategy`) does not have this crash. 
There the unfold sits inside `translateLeafNodeFilter`, which only recognizes 
leaf shapes, so a compound replacement simply isn't translated (returns `None`) 
-- no pushdown, no map entry, correct results. That's a milder 
missed-optimization version of the same root cause; I left V1 as-is since 
fixing it would be an enhancement (and would lose the readable wrapper in the 
post-scan filter).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to