Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/8747#issuecomment-142764287
I tested it on 2 cases: deeply nested row and long row with non-null
variable-length fields, the result is:
| old code | new code
----------|--------------|--------
case 1(deeply nested) | 2400ms | 2100ms
case 2(long row) | 900ms | 650ms
The improvement of case 1 is beacuse we avoid the extra copy.
The improvement of case 2 is beacuse now we clear the null bits region at
the begining so that we don't need to reset it again when writing each non-null
field.
Actually it's not a big improvement and maybe much less for end-to-end
performance, so I'm not sure if this change worth.
cc @davies @rxin
testing code:
```
test("perf1") {
val superNestedStruct = {
def makeStruct(struct: StructType, i: Int): StructType = {
if (i == 0) {
struct
} else {
makeStruct(new StructType().add("i", struct, false), i - 1)
}
}
makeStruct(new StructType().add("a", LongType), 100)
}
val converter = UnsafeProjection.create(superNestedStruct)
val generator = RandomDataGenerator.forType(superNestedStruct, nullable
= false).get
val input =
CatalystTypeConverters.convertToCatalyst(generator()).asInstanceOf[InternalRow]
var i = 100000
val start = System.currentTimeMillis()
while (i > 0) {
converter(input)
i -= 1
}
val end = System.currentTimeMillis()
println(end - start)
}
test("perf2") {
val struct = StructType((1 to 100).map { i =>
StructField(i.toString, CalendarIntervalType, false)
})
val converter = UnsafeProjection.create(struct)
val generator = RandomDataGenerator.forType(struct, nullable =
false).get
val input =
CatalystTypeConverters.convertToCatalyst(generator()).asInstanceOf[InternalRow]
var i = 100000
val start = System.currentTimeMillis()
while (i > 0) {
converter(input)
i -= 1
}
val end = System.currentTimeMillis()
println(end - start)
}
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]