harshmotw-db commented on code in PR #47473:
URL: https://github.com/apache/spark/pull/47473#discussion_r1690573509
##########
common/variant/README.md:
##########
@@ -335,27 +335,29 @@ The Decimal type contains a scale, but no precision. The
implied precision of a
| Object | `2` | A collection of (string-key, variant-value) pairs |
| Array | `3` | An ordered sequence of variant values |
-| Primitive Type | Type ID | Equivalent Parquet Type | Binary
format
|
-|-----------------------------|---------|---------------------------|-----------------------------------------------------------------------------------------------------------|
-| null | `0` | any | none
|
-| boolean (True) | `1` | BOOLEAN | none
|
-| boolean (False) | `2` | BOOLEAN | none
|
-| int8 | `3` | INT(8, signed) | 1 byte
|
-| int16 | `4` | INT(16, signed) | 2 byte
little-endian
|
-| int32 | `5` | INT(32, signed) | 4 byte
little-endian
|
-| int64 | `6` | INT(64, signed) | 8 byte
little-endian
|
-| double | `7` | DOUBLE | IEEE
little-endian
|
-| decimal4 | `8` | DECIMAL(precision, scale) | 1 byte
scale in range [0, 38], followed by little-endian unscaled value (see decimal
table) |
-| decimal8 | `9` | DECIMAL(precision, scale) | 1 byte
scale in range [0, 38], followed by little-endian unscaled value (see decimal
table) |
-| decimal16 | `10` | DECIMAL(precision, scale) | 1 byte
scale in range [0, 38], followed by little-endian unscaled value (see decimal
table) |
-| date | `11` | DATE | 4 byte
little-endian
|
-| timestamp | `12` | TIMESTAMP(true, MICROS) | 8-byte
little-endian
|
-| timestamp without time zone | `13` | TIMESTAMP(false, MICROS) | 8-byte
little-endian
|
-| float | `14` | FLOAT | IEEE
little-endian
|
-| binary | `15` | BINARY | 4 byte
little-endian size, followed by bytes
|
-| string | `16` | STRING | 4 byte
little-endian size, followed by UTF-8 encoded bytes
|
-| binary from metadata | `17` | BINARY |
Little-endian index into the metadata dictionary. Number of bytes is equal to
the metadata `offset_size`. |
-| string from metadata | `18` | STRING |
Little-endian index into the metadata dictionary. Number of bytes is equal to
the metadata `offset_size`. |
+| Primitive Type | Type ID | Equivalent Parquet Type
| Binary format
|
+|-----------------------------|---------|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
+| null | `0` | any
| none
|
+| boolean (True) | `1` | BOOLEAN
| none
|
+| boolean (False) | `2` | BOOLEAN
| none
|
+| int8 | `3` | INT(8, signed)
| 1 byte
|
+| int16 | `4` | INT(16, signed)
| 2 byte little-endian
|
+| int32 | `5` | INT(32, signed)
| 4 byte little-endian
|
+| int64 | `6` | INT(64, signed)
| 8 byte little-endian
|
+| double | `7` | DOUBLE
| IEEE little-endian
|
+| decimal4 | `8` | DECIMAL(precision, scale)
| 1 byte scale in range [0, 38], followed by little-endian unscaled
value (see decimal table) |
+| decimal8 | `9` | DECIMAL(precision, scale)
| 1 byte scale in range [0, 38], followed by little-endian unscaled
value (see decimal table) |
+| decimal16 | `10` | DECIMAL(precision, scale)
| 1 byte scale in range [0, 38], followed by little-endian unscaled
value (see decimal table) |
+| date | `11` | DATE
| 4 byte little-endian
|
+| timestamp | `12` | TIMESTAMP(true, MICROS)
| 8-byte little-endian
|
+| timestamp without time zone | `13` | TIMESTAMP(false, MICROS)
| 8-byte little-endian
|
+| float | `14` | FLOAT
| IEEE little-endian
|
+| binary | `15` | BINARY
| 4 byte little-endian size, followed by bytes
|
+| string | `16` | STRING
| 4 byte little-endian size, followed by UTF-8 encoded bytes
|
+| binary from metadata | `17` | BINARY
| Little-endian index into the metadata dictionary. Number of bytes is
equal to the metadata `offset_size`. |
+| string from metadata | `18` | STRING
| Little-endian index into the metadata dictionary. Number of bytes is
equal to the metadata `offset_size`. |
+| year-month interval | `19` | YearMonthIntervalType(start_field,
end_field) | 1 byte denoting start field (1 bit) and end field (1 bit) starting
at LSB followed by 4-byte little-endian value. |
Review Comment:
I had mistakenly put in the equivalent spark types here earlier. I have
removed the parquet types for now as I am investigating the parquet types.
The details about the start and end field are in a paragraph after this
table in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]