viirya opened a new pull request, #54299:
URL: https://github.com/apache/spark/pull/54299
## What changes were proposed in this pull request?
This PR fixes a bug where applying `withWatermark()` to a nested struct
field causes a SparkException with message "Failed to copy node. Is
otherCopyArgs specified correctly for EventTimeWatermark. Exception message:
argument type mismatch..."
The issue occurs because the DataFrame API's `withWatermark()` method
directly creates `EventTimeWatermark` with an `UnresolvedAttribute`, while the
SQL parser creates `UnresolvedEventTimeWatermark` which gets properly resolved.
When resolving nested field references like "kolona.timestamp", the analyzer
resolves them to `Alias(ExtractValue(...), "timestamp")` expressions, but
`EventTimeWatermark` expects its `eventTime` parameter to be an `Attribute`,
not an `Alias`. This type mismatch causes the error during tree node copying.
## Why are the changes needed?
Users should be able to apply watermarks to nested struct fields without
encountering errors. This is a valid use case for structured streaming
applications.
## Does this PR introduce any user-facing change?
Yes, users can now successfully apply watermarks to nested struct fields:
```scala
df.withWatermark("nested_struct.timestamp", "10 seconds")
```
Previously this would fail with a SparkException.
## How was this patch tested?
- Added unit test in ResolveEventTimeWatermarkSuite for nested field
resolution
- Added end-to-end test in EventTimeWatermarkSuite with streaming DataFrame
- Both tests verify that watermarks on nested fields are properly resolved
### Was this patch authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
patch, please include the
phrase: 'Generated-by: ' followed by the name of the tool and its version.
If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
-->
Generated-by: Claude Sonnet 4.5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]