MaxGekk commented on code in PR #56677:
URL: https://github.com/apache/spark/pull/56677#discussion_r3460524493
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala:
##########
@@ -119,8 +119,15 @@ object ComputeCurrentTime extends Rule[LogicalPlan] {
val currentDates = collection.mutable.HashMap.empty[ZoneId, Literal]
val localTimestamps = collection.mutable.HashMap.empty[ZoneId, Literal]
+ // The CAST bit is included so this rule can find TIME -> TIMESTAMP_NTZ
casts (which depend on
+ // CURRENT_DATE) and stabilize them below. CAST is a broad pattern, so
this widens the rule's
+ // traversal to most plans; the precise `Cast.isTimeToTimestampNTZ` guard
keeps the rewrite
+ // scoped. We intentionally do not tag these casts with CURRENT_LIKE
instead: inline-table
+ // validation treats CURRENT_LIKE as safe to defer, so tagging would let
unrelated non-foldable
+ // NTZ-target casts (e.g. CAST(rand() AS TIMESTAMP_NTZ)) bypass that
validation (see SPARK-57618
+ // and ResolveInlineTablesSuite).
def transformCondition(treePatternbits: TreePatternBits): Boolean = {
- treePatternbits.containsPattern(CURRENT_LIKE)
+ treePatternbits.containsPattern(CURRENT_LIKE) ||
treePatternbits.containsPattern(CAST)
Review Comment:
The challenge with a dedicated Cast-tagged pattern is that to make it
actually narrow — i.e. only TIME -> TIMESTAMP_NTZ — nodePatternsInternal()
would have to inspect child.dataType. That's exactly the path we removed in
51136ecb5e2: node patterns are computed eagerly at construction, before the
child is resolved, and child.dataType throws (including on an OuterReference
that reports resolved == true but wraps an unresolved attribute, which broke
CREATE FUNCTION ... RETURNS TABLE). The only safe key is the always-available
target type, which tags every cast to the NTZ family — not just TIME-sourced
ones — so it's only modestly narrower than CAST while adding a new global
TreePattern enum entry for a rule that runs once per query.
We also can't reuse CURRENT_LIKE for the tag: inline-table validation
treats CURRENT_LIKE as safe-to-defer, so tagging let CAST(rand() AS
TIMESTAMP_NTZ) bypass validation (1b97171f869). Pruning on CAST with the
precise isTimeToTimestampNTZ guard on the resolved plan is the configuration
that's both safe and correct. If the traversal cost turns out to matter I'm
happy to add a target-keyed dedicated pattern, but it buys little over CAST
here — would you be OK leaving it as-is?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]