Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/10809#discussion_r50029727
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
---
@@ -73,9 +70,27 @@ object GenerateUnsafeProjection extends
CodeGenerator[Seq[Expression], UnsafePro
row: String,
inputs: Seq[ExprCode],
inputTypes: Seq[DataType],
- bufferHolder: String): String = {
+ bufferHolder: String,
+ isTopLevel: Boolean = false): String = {
+ val rowWriterClass = classOf[UnsafeRowWriter].getName
val rowWriter = ctx.freshName("rowWriter")
- ctx.addMutableState(rowWriterClass, rowWriter, s"this.$rowWriter = new
$rowWriterClass();")
+ ctx.addMutableState(rowWriterClass, rowWriter,
+ s"this.$rowWriter = new $rowWriterClass($bufferHolder,
${inputs.length});")
+
+ val resetWriter = if (isTopLevel) {
+ // For top level row writer, it always writes to the beginning of
the global buffer holder,
+ // which means its fixed-size region always in the same position, so
we don't need to call
+ // `reset` to set up its fixed-size region every time.
+ if (inputs.map(_.isNull).forall(_ == "false")) {
+ // If all fields are not nullable, which means the null bits never
changes, then we don't
+ // need to clear it out every time.
+ ""
+ } else {
+ s"$rowWriter.zeroOutNullBites();"
--- End diff --
Here I made a different decision compare to the unsafe parquet reader. We
can clear out the null bits at beginning, and call `UnsafeRowWriter.write`
instead of `UnsafeRow.setXXX`, which saves one null bits updating. If null
values are rare, this one should be faster. I'll benchmark it later.
cc @nongli
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]