ZitingShen commented on PR #56536: URL: https://github.com/apache/spark/pull/56536#issuecomment-4732888021
Discussed with @sunchao: After digging further, we don’t think this is the right change for OSS Spark. Making every ColumnarToRowExec output nullable changes a very broad execution contract on a hot path: downstream JVM row operators now need to treat the physical null bitmap as authoritative even when the optimized plan says nullable=false. That adds null-check cost after common scan boundaries, changes explain/canonical/state-schema behavior, and still does not provide true end-to-end support for nulls under a non-null logical contract because Catalyst has already optimized using nullable=false. If Spark wants to harden this case, it should be through a narrower invariant check or a fully designed end-to-end semantic change with optimizer, execution, tests, and performance validation, not this localized row-boundary rewrite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
