Repository: parquet-format Updated Branches: refs/heads/master e127c3f7f -> f59258a05
PARQUET-322 Document ENUM as a logical type. Author: Jakub Kukul <ja...@mbr-targeting.com> Closes #54 from jkukul/master and squashes the following commits: a2490b2 [Jakub Kukul] PARQUET-322 Document ENUM as a logical type. Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/f59258a0 Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/f59258a0 Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/f59258a0 Branch: refs/heads/master Commit: f59258a0519fb4ed8fa25a88593a2d034ce909c6 Parents: e127c3f Author: Jakub Kukul <ja...@mbr-targeting.com> Authored: Fri Oct 6 16:57:21 2017 -0700 Committer: Ryan Blue <b...@apache.org> Committed: Fri Oct 6 16:57:21 2017 -0700 ---------------------------------------------------------------------- LogicalTypes.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/parquet-format/blob/f59258a0/LogicalTypes.md ---------------------------------------------------------------------- diff --git a/LogicalTypes.md b/LogicalTypes.md index 6e5c9db..c50b96b 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -32,13 +32,24 @@ This file contains the specification for all logical types. The parquet format's `ConvertedType` stores the type annotation. The annotation may require additional metadata fields, as well as rules for those fields. -### UTF8 (Strings) +## String Types + +### UTF8 `UTF8` may only be used to annotate the binary primitive type and indicates that the byte array should be interpreted as a UTF-8 encoded character string. The sort order used for `UTF8` strings is unsigned byte-wise comparison. +### ENUM + +`ENUM` annotates the binary primitive type and indicates that the value +was converted from an enumerated type in another data model (e.g. Thrift, Avro, Protobuf). +Applications using a data model lacking a native enum type should interpret `ENUM` +annotated field as a UTF-8 encoded string. + +The sort order used for `ENUM`s is `UNSIGNED` byte-wise comparison. + ## Numeric Types ### Signed Integers