gene-db commented on code in PR #475:
URL: https://github.com/apache/parquet-format/pull/475#discussion_r1870179241
##
VariantEncoding.md:
##
@@ -88,9 +88,9 @@ metadata |header |
+---+
```
-The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a little-endian value of `offset_size` bytes, and represents the
number of string values in the dictionary.
+The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a unsigned little-endian value of `offset_size` bytes, and represents
the number of string values in the dictionary.
Next, is an `offset` list, which contains `dictionary_size + 1` values.
-Each `offset` is a little-endian value of `offset_size` bytes, and represents
the starting byte offset of the i-th string in `bytes`.
+Each `offset` is a usigned little-endian value of `offset_size` bytes, and
represents the starting byte offset of the i-th string in `bytes`.
Review Comment:
```suggestion
Each `offset` is an unsigned little-endian value of `offset_size` bytes, and
represents the starting byte offset of the i-th string in `bytes`.
```
##
VariantEncoding.md:
##
@@ -69,17 +69,17 @@ The entire metadata is encoded as the following diagram
shows:
metadata |header |
+---+
| |
- :dictionary_size: <-- little-endian, `offset_size` bytes
+ :dictionary_size: <-- unsigned little-endian, `offset_size`
bytes
| |
+---+
| |
- :offset : <-- little-endian, `offset_size` bytes
+ :offset : <-- unsigned little-endian,
`offset_size` bytes
Review Comment:
NIT:
```suggestion
:offset : <-- unsigned little-endian,
`offset_size` bytes
```
##
VariantEncoding.md:
##
@@ -88,9 +88,9 @@ metadata |header |
+---+
```
-The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a little-endian value of `offset_size` bytes, and represents the
number of string values in the dictionary.
+The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a unsigned little-endian value of `offset_size` bytes, and represents
the number of string values in the dictionary.
Review Comment:
```suggestion
The metadata is encoded first with the `header` byte, then `dictionary_size`
which is an unsigned little-endian value of `offset_size` bytes, and represents
the number of string values in the dictionary.
```
##
VariantEncoding.md:
##
@@ -69,17 +69,17 @@ The entire metadata is encoded as the following diagram
shows:
metadata |header |
+---+
| |
- :dictionary_size: <-- little-endian, `offset_size` bytes
+ :dictionary_size: <-- unsigned little-endian, `offset_size`
bytes
| |
+---+
| |
- :offset : <-- little-endian, `offset_size` bytes
+ :offset : <-- unsigned little-endian,
`offset_size` bytes
| |
+---+
:
+---+
| |
- :offset : <-- little-endian, `offset_size` bytes
+ :offset : <-- unsigned little-endian,
`offset_size` bytes
Review Comment:
NIT:
```suggestion
:offset : <-- unsigned little-endian,
`offset_size` bytes
```
##
VariantEncoding.md:
##
@@ -313,10 +313,10 @@ array value_data | |
| |
+---+
```
-An array `value_data` begins with `num_elements`, a 1-byte or 4-byte
little-endian value, representing the number of elements in the array.
+An array `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned
little-endian value, representing the number of elements in the array.
The size in bytes of `num_elements` is indicated by `is_large` in the
`value_header`.
Next, is a `field_offset` list.
-There are `num_elements + 1` number of entries and each `field_offset` is a
little-endian value of `field_offset_size` bytes.
+There are `num_elements + 1` number of entries and each `field_offset` is a
unsigned little-endian value of `field_offset_size` bytes.
Review Comment:
```suggestion
There are `num_elements + 1` number of entries and each `field_offset` is an
unsigned little-endian v