[jira] [Commented] (PARQUET-1312) Improve logical types documentation

2018-06-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521893#comment-16521893
 ] 

ASF GitHub Bot commented on PARQUET-1312:
-

gszadovszky closed pull request #98: PARQUET-1312: Improve logical types 
documentation
URL: https://github.com/apache/parquet-format/pull/98
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/LogicalTypes.md b/LogicalTypes.md
index 762769e7..3be6f211 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -29,17 +29,41 @@ This file contains the specification for all logical types.
 
 ### Metadata
 
-The parquet format's `ConvertedType` stores the type annotation. The annotation
+The parquet format's `LogicalType` stores the type annotation. The annotation
 may require additional metadata fields, as well as rules for those fields.
+There is an older representation of the logical type annotations called 
`ConvertedType`.
+To support backward compatibility with old files, readers should interpret 
`LogicalTypes`
+in the same way as `ConvertedType`, and writers should populate 
`ConvertedType` in the metadata
+according to well defined conversion rules.
+
+### Compatibility
+
+The Thrift definition of the metadata has two fields for logical types: 
`ConvertedType` and `LogicalType`.
+`ConvertedType` is an enum of all available annotation. Since Thrift enums 
can't have additional type parameters,
+it is cumbersome to define additional type parameters, like decimal scale and 
precision
+(which are additional 32 bit integer fields on SchemaElement, and are relevant 
only for decimals) or time unit
+and UTC adjustment flag for Timestamp types. To overcome this problem, a new 
logical type representation was introduced into
+the metadata to replace `ConvertedType`: `LogicalType`.  The new 
representation is a union of struct of logical types,
+this way allowing more flexible API, logical types can have type parameters.
+
+However, to maintain compatibility, Parquet readers should be able to read
+and interpret old logical type representation (in case the new one is not 
present,
+because the file was written by older writer), and write `ConvertedType` field 
for old readers.
+
+Compatibility considerations are mentioned for each annotation in the 
corresponding section.
 
 ## String Types
 
-### UTF8
+### STRING
 
-`UTF8` may only be used to annotate the binary primitive type and indicates
+`STRING` may only be used to annotate the binary primitive type and indicates
 that the byte array should be interpreted as a UTF-8 encoded character string.
 
-The sort order used for `UTF8` strings is unsigned byte-wise comparison.
+The sort order used for `STRING` strings is unsigned byte-wise comparison.
+
+*Compatibility*
+
+`STRING` corresponds to `UTF8` ConvertedType.
 
 ### ENUM
 
@@ -65,17 +89,21 @@ The sort order used for `UUID` values is unsigned byte-wise 
comparison.
 
 ### Signed Integers
 
-`INT_8`, `INT_16`, `INT_32`, and `INT_64` annotations can be used to specify
-the maximum number of bits in the stored value.  Implementations may use these
-annotations to produce smaller in-memory representations when reading data.
+`INT` annotation can be used to specify the maximum number of bits in the 
stored value.
+The annotation has two parameter: bit width and sign.
+Allowed bit width values are `8`, `16`, `32`, `64`, and sign can be `true` or 
`false`.
+For signed integers, the second parameter should be `true`,
+for example, a signed integer with bit width of 8 is defined as `INT(8, true)`
+Implementations may use these annotations to produce smaller
+in-memory representations when reading data.
 
 If a stored value is larger than the maximum allowed by the annotation, the
 behavior is not defined and can be determined by the implementation.
 Implementations must not write values that are larger than the annotation
 allows.
 
-`INT_8`, `INT_16`, and `INT_32` must annotate an `int32` primitive type and
-`INT_64` must annotate an `int64` primitive type. `INT_32` and `INT_64` are
+`INT(8, true)`, `INT(16, true)`, and `INT(32, true)` must annotate an `int32` 
primitive type and
+`INT(64, true)` must annotate an `int64` primitive type. `INT(32, true)` and 
`INT(64, true)` are
 implied by the `int32` and `int64` primitive types if no other annotation is
 present and should be considered optional.
 
@@ -83,9 +111,13 @@ The sort order used for signed integer types is signed.
 
 ### Unsigned Integers
 
-`UINT_8`, `UINT_16`, `UINT_32`, and `UINT_64` annotations can be used to
-specify unsigned integer types, along with a maximum number of bits in the
-stored value. Implementations may use these annotations to produce smaller
+`INT` annotation can be 

[jira] [Commented] (PARQUET-1312) Improve logical types documentation

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509547#comment-16509547
 ] 

ASF GitHub Bot commented on PARQUET-1312:
-

nandorKollar opened a new pull request #98: PARQUET-1312: Improve logical types 
documentation
URL: https://github.com/apache/parquet-format/pull/98
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve logical types documentation
> ---
>
> Key: PARQUET-1312
> URL: https://issues.apache.org/jira/browse/PARQUET-1312
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Nandor Kollar
>Priority: Major
>
> Logical types 
> [documentation|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]
>  should be updated with the new type parameters introduced with the new 
> logical types API (see details in PARQUET-1253 and PARQUET-906)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)