Re: [PR] [SPARK-45827] Add Variant data type in Spark. [spark]

via GitHub Fri, 10 Nov 2023 00:40:26 -0800


cloud-fan commented on code in PR #43707:
URL: https://github.com/apache/spark/pull/43707#discussion_r1389080585



##########
common/unsafe/src/main/java/org/apache/spark/unsafe/types/VariantVal.java:
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.types;
+
+import org.apache.spark.unsafe.Platform;
+
+import java.io.Serializable;
+import java.util.Arrays;
+
+/**
+ * The physical data representation of {@link 
org.apache.spark.sql.types.VariantType} that
+ * represents a semi-structured value. It consists of two binary values: 
{@link VariantVal#value}
+ * and {@link VariantVal#metadata}. The value encodes types and values, but 
not field names. The
+ * metadata currently contains a version flag and a list of field names. We 
can extend/modify the
+ * detailed binary format given the version flag.
+ * <p>
+ * A {@link VariantVal} can be produced by casting another value into the 
Variant type or parsing a
+ * JSON string in the {@link 
org.apache.spark.sql.catalyst.expressions.variant.ParseJson}
+ * expression. We can extract a path consisting of field names and array 
indices from it, cast it
+ * into a concrete data type, or rebuild a JSON string from it.
+ * <p>
+ * The storage layout of this class in {@link 
org.apache.spark.sql.catalyst.expressions.UnsafeRow}
+ * and {@link org.apache.spark.sql.catalyst.expressions.UnsafeArrayData} is: 
the fixed-size part is
+ * a long value "offsetAndSize". The upper 32 bits is the offset that points 
to the start position
+ * of the actual binary content. The lower 32 bits is the total length of the 
binary content. The
+ * binary content contains: 4 bytes representing the length of {@link 
VariantVal#value}, content of
+ * {@link VariantVal#value}, content of {@link VariantVal#metadata}. This is 
an internal and
+ * transient format and can be modified at any time.
+ */
+public class VariantVal implements Serializable {
+  protected final byte[] value;
+  protected final byte[] metadata;
+
+  public VariantVal(byte[] value, byte[] metadata) {
+    this.value = value;
+    this.metadata = metadata;
+  }
+
+  public byte[] getValue() {
+    return value;
+  }
+
+  public byte[] getMetadata() {
+    return metadata;
+  }
+
+  /**
+   * This function writes the binary content into {@code buffer} starting from 
{@code cursor}, as
+   * described in the class comment. The caller should guarantee there is 
enough space in `buffer`.
+   */
+  public void writeIntoUnsafeRow(byte[] buffer, long cursor) {

Review Comment:
   Actually this makes sense to me. The job of writing to unsafe row belongs to 
`UnsafeWriter` and the code should be put there as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45827] Add Variant data type in Spark. [spark]

Reply via email to