cloud-fan commented on code in PR #45479:
URL: https://github.com/apache/spark/pull/45479#discussion_r1524535881


##########
common/variant/README.md:
##########
@@ -0,0 +1,127 @@
+# Overview
+
+A Variant represents a type that contain one of:
+- Primitive: A type and corresponding value (e.g. INT, STRING)
+- Array: An ordered list of Variant values
+- Object: An unordered collection of string/Variant pairs (i.e. key/value 
pairs). An object may not contain duplicate keys.
+
+A variant is encoded with 2 binary values, the value and the metadata.
+
+There are a fixed number of allowed primitive types, provided in the table 
below. These represent a commonly supported subset of the [logical 
types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) 
allowed by the Parquet.
+
+The Variant spec allows representation of semi-structured data (e.g. JSON) in 
a form that can be efficiently queried by path. The design is intended to allow 
efficient access to nested data even in the presence of very wide or deep 
structures.
+
+Another motivation for the representation is that (aside from metadata) each 
inner Variant value is contiguous and self-contained. For example, in a Variant 
containing an Array of Variant values, the representation of an inner Variant 
value, when paired with the metadata of the full variant, is itself a valid 
Variant.
+
+# Metadata encoding
+
+The grammar for encoded metadata is as follows
+
+```
+metadata: <header> <dictionary_size> <dictionary>
+header: 1 byte (<version> | <sorted_strings> << 4 | (<offset_size_minus_one> 
<< 6))
+version: a 4-bit version ID. Currently, must always contain the value 1

Review Comment:
   hmmm, at most 16 versions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to