Repository: parquet-format Updated Branches: refs/heads/master 7923dc673 -> 499d59710
PARQUET-1032: fix varint-encode() encoding algorithm link The spec says that varint-encode() is ULEB-128 encoding but links to VLQ algorithm that is slightly different from ULEB-128. Author: kostya-sh <kostya...@users.noreply.github.com> Closes #69 from kostya-sh/patch-1 and squashes the following commits: f128603 [kostya-sh] PARQUET-1032: fix varint-encode() encoding algorithm link Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/499d5971 Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/499d5971 Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/499d5971 Branch: refs/heads/master Commit: 499d597106e9ff68b802f1a68ea70c9df55c68d7 Parents: 7923dc6 Author: kostya-sh <kostya...@users.noreply.github.com> Authored: Fri Oct 6 16:21:49 2017 -0700 Committer: Ryan Blue <b...@apache.org> Committed: Fri Oct 6 16:21:49 2017 -0700 ---------------------------------------------------------------------- Encodings.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/parquet-format/blob/499d5971/Encodings.md ---------------------------------------------------------------------- diff --git a/Encodings.md b/Encodings.md index c4cdf70..7cf880e 100644 --- a/Encodings.md +++ b/Encodings.md @@ -101,7 +101,7 @@ repeated-value := value that is repeated, using a fixed-width of round-up-to-nex shifting and ORing with a mask. (to make this optimization work on a big-endian machine, you would have to use the ordering used in the [deprecated bit-packing](#BITPACKED) encoding) -2. varint-encode() is ULEB-128 encoding, see http://en.wikipedia.org/wiki/Variable-length_quantity +2. varint-encode() is ULEB-128 encoding, see https://en.wikipedia.org/wiki/LEB128 ### <a name="BITPACKED"></a>Bit-packed (Deprecated) (BIT_PACKED = 4) This is a bit-packed only encoding, which is deprecated and will be replaced by the [RLE/bit-packing](#RLE) hybrid encoding. @@ -230,7 +230,7 @@ Supported Types: BYTE_ARRAY This is also known as incremental encoding or front compression: for each element in a sequence of strings, store the prefix length of the previous entry plus the suffix. -For a longer description, see http://en.wikipedia.org/wiki/Incremental_encoding. +For a longer description, see https://en.wikipedia.org/wiki/Incremental_encoding. This is stored as a sequence of delta-encoded prefix lengths (DELTA_BINARY_PACKED), followed by the suffixes encoded as delta length byte arrays (DELTA_LENGTH_BYTE_ARRAY).