Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/10809#discussion_r50172662
--- Diff:
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
---
@@ -21,24 +21,36 @@
import org.apache.spark.unsafe.Platform;
/**
- * A helper class to manage the row buffer when construct unsafe rows.
+ * A helper class to manage the data buffer for an unsafe row. The data
buffer can grow and
+ * automatically re-point the unsafe row to it.
+ *
+ * This class can be used to build a one-pass unsafe row writing program,
i.e. data will be written
+ * to the data buffer directly and no extra copy is needed. There should
be only one instance of
+ * this class per writing program, so that the memory segment/data buffer
can be reused. Note that
+ * for each incoming record, we should call `reset` of BufferHolder
instance before write the record
+ * and reuse the data buffer.
--- End diff --
Could you also comment that we should either call `unsafeRow.pointTo()` or
`unsafeRow.setTotalSize()` ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]