bersprockets opened a new pull request, #36183:
URL: https://github.com/apache/spark/pull/36183
### What changes were proposed in this pull request?
Make `NewInstance` non-foldable.
### Why are the changes needed?
When handling Java beans as input, Spark creates `NewInstance` with no
arguments. On master and 3.3, `NewInstance` with no arguments is considered
foldable. As a result, the `ConstantFolding` rule converts `NewInstance` into a
`Literal` holding an instance of the user's specified Java bean. The instance
becomes a singleton that gets reused for each input record (although its fields
get updated by `InitializeJavaBean`).
Because the instance gets reused, sometimes multiple buffers in
`AggregationIterator` are actually referring to the same Java bean instance.
Take, for example, the test I added in this PR, or the `spark-shell` example
I added to SPARK-38823 as a comment.
The input is:
```
new Item("a", 1),
new Item("b", 3),
new Item("c", 2),
new Item("a", 7)
```
As `ObjectAggregationIterator` reads the input, the buffers get set up as
follows (note that the first field of Item should be the same as the key):
```
- Read Item("a", 1)
- Buffers are now:
Key "a" --> Item("a", 1)
- Read Item("b", 3)
- Buffers are now:
Key "a" -> Item("b", 3)
Key "b" -> Item("b", 3)
```
The buffer for key "a" now contains `Item("b", 3)`. That's because both
buffers contain a reference to the same Item instance, and that Item instance's
fields were updated when `Item("b", 3)` was read.
This PR makes `NewInstance` non-foldable, so it will not get optimized away,
thus ensuring a new instance of the Java bean for each input record.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]