Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/12750#issuecomment-215941197
@NathanHowell, I played around with your code from
https://github.com/apache/spark/pull/12750#issuecomment-215607825 and it was a
bit slower than mine because it turns out that comparing `StructType`s for
equality is really expensive and the benefits of avoiding object allocation
were outweighed by the additional equality checks.
There are a couple of minor tricks which can improve the performance of
your approach, though. In
```scala
if (biggerVal.dataType != smallerVal.dataType) {
val merged = compatibleType(biggerVal.dataType,
smallerVal.dataType)
// test to see if the merged type is equivalent to one of the
existing
// StructField instances, reuse will reduce GC pressure
if (smallerVal.dataType == merged) {
bigger.update(biggerIdx, smallerVal)
} else if (biggerVal.dataType == merged) {
// do nothing, the bigger struct already has the correct field
} else {
// we can't reuse an existing field so allocate a new one
bigger.update(biggerIdx, biggerVal.copy(dataType = merged))
}
[...]`
```
note that `compatibleType`'s first step is to compare its inputs for
equality, so if the two input types are equal then one of those inputs will be
re-used as the value of `merged`. Thus, I think you can remove some checks by
rewriting this as
```scala
val merged = compatibleType(biggerVal.dataType,
smallerVal.dataType)
if (smallerVal.dataType eq merged) {
bigger.update(biggerIdx, smallerVal)
} else if (biggerVal.dataType eq merged) {
// do nothing, the bigger struct already has the correct field
} else {
// we can't reuse an existing field so allocate a new one
bigger.update(biggerIdx, biggerVal.copy(dataType = merged))
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]