Github user NathanHowell commented on the pull request:
https://github.com/apache/spark/pull/12750#issuecomment-215607825
Alright, here's a few ideas that will at least reduce allocations by a bit.
Your version with the merge sort is likely better than the insertion sort here
but I thought I'd try something different :trollface: any chance you can run it
through your benchmark?
I'll put my other comments inline, generally looks good to me.
``` scala
/**
* Combine two sorted arrays of StructFields into a new StructType
*/
private def compatibleFields(
fields1: Array[StructField],
fields2: Array[StructField]): StructType = {
// perform an insertion sort of the smaller struct into the larger one
val (bigger, smaller) = if (fields1.length > fields2.length) {
(fields1.toBuffer, fields2)
} else {
(fields2.toBuffer, fields1)
}
var biggerIdx = 0
var smallerIdx = 0
while (biggerIdx < bigger.length && smallerIdx < smaller.length) {
val biggerVal = bigger(biggerIdx)
val smallerVal = smaller(smallerIdx)
val comp = biggerVal.name.compareTo(smallerVal.name)
if (comp == 0) {
if (biggerVal.dataType != smallerVal.dataType) {
val merged = compatibleType(biggerVal.dataType,
smallerVal.dataType)
// test to see if the merged type is equivalent to one of the
existing
// StructField instances, reuse will reduce GC pressure
if (smallerVal.dataType == merged) {
bigger.update(biggerIdx, smallerVal)
} else if (biggerVal.dataType == merged) {
// do nothing, the bigger struct already has the correct field
} else {
// we can't reuse an existing field so allocate a new one
bigger.update(biggerIdx, biggerVal.copy(dataType = merged))
}
}
biggerIdx += 1
smallerIdx += 1
} else if (comp > 0) {
bigger.insert(biggerIdx, smallerVal)
// bump both indexes, the bigger struct has grown
biggerIdx += 1
smallerIdx += 1
} else { // comp < 0
// advance to the next field on the bigger struct
// nothing else to do here
biggerIdx += 1
}
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]