Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/10628#discussion_r49045516
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
---
@@ -0,0 +1,165 @@
+package org.apache.spark.sql.execution.vectorized;
+
+import org.apache.spark.sql.types.DataType;
+import org.apache.spark.sql.types.DoubleType;
+import org.apache.spark.sql.types.IntegerType;
+import org.apache.spark.unsafe.Platform;
+
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+
+/**
+ * A column backed by an in memory JVM array. This stores the NULLs as a
byte per value
+ * and a java array for the values.
+ */
+public final class OnHeapColumnVector extends ColumnVector {
+ // The data stored in these arrays need to maintain binary compatible.
We can
+ // directly pass this buffer to external components.
+
+ // This is faster than a boolean array and we optimize this over memory
footprint.
--- End diff --
UnsafeRow should also use one byte per column (only 10% if the number of
column is larger than 8), cc @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]